Skip to content

Skill Creator

The Skill Creator is a visual tool for building new Android automation skills through LLM-assisted interaction. It provides a split-screen interface in the dashboard.

PanelContent
LeftLLM chat — type natural language instructions, view proposed action plans
RightLive device screen (WebRTC) with numbered interactive element overlay
User types: "Open Gmail and compose an email to test@example.com"
|
v
Dashboard collects context:
- Current screen elements (via GET /api/phone/elements/<device>)
- Action history (last 15 entries)
- Selected backend + model
|
v
POST /api/creator/chat -> server builds system prompt with:
- Capability list (tap, swipe, type, launch, screenshot, etc.)
- Existing skill references (tiktok: 13 actions, base: 9 actions)
- Current screen elements with bounds + labels
- Action history
|
v
LLM returns structured skill spec + action plan as JSON
|
v
Dashboard renders:
- Natural language explanation
- Structured skill spec with parameters
- Step-by-step plan with "Execute" button
|
v
Execute -> sends tap/type/back commands to device -> results feed back to LLM
BackendConfigDefault ModelTimeout
OpenRouterOPENROUTER_API_KEY env varanthropic/claude-sonnet-460s
Claude APIANTHROPIC_API_KEY env varclaude-sonnet-4-2025051460s
OllamaAuto-detect at localhost:11434llama360s
Claude Codeclaude CLI installedsonnet120s

The backend and model selection persist in localStorage across sessions.

The LLM produces structured JSON action plans:

{
"name": "send_gmail",
"display_name": "Send Gmail Email",
"description": "Compose and send an email via the Gmail app",
"app_package": "com.google.android.gm",
"parameters": [
{
"name": "recipient",
"type": "string",
"description": "Email address",
"example": "hello@example.com",
"required": true
}
],
"usage_example": "Send an email to john@example.com with subject 'Meeting'",
"steps": [
{"action": "launch", "package": "com.google.android.gm", "goal": "Open Gmail"},
{"action": "tap", "element_idx": 5, "goal": "Open composer"},
{"action": "type", "text": "{recipient}", "goal": "Enter recipient"}
]
}

Parameters use {param_name} placeholders in step text fields, filled at runtime.

  1. Open the dashboard at http://localhost:5055
  2. Navigate to the Skill Creator tab
  3. Select your LLM backend and model from the dropdowns
  4. Start the WebRTC stream for your target device (Phone Admin tab or inline)
  5. Type a natural language instruction in the chat input
  6. Review the LLM’s proposed skill spec and action steps
  7. Click Execute to run steps on the device, or approve individually
  8. Iterate: results feed back to the LLM as context for refinement

The right panel shows the live device screen with numbered labels on every interactive element. This helps you:

  • Reference elements by index when chatting with the LLM (“tap element #5”)
  • Identify resource IDs and content descriptions for writing elements.yaml
  • Verify that the correct element was found

Elements are fetched via GET /api/phone/elements/<device>.

MethodEndpointPurpose
POST/api/creator/chatSend message to LLM with screen context
GET/api/creator/ollama-modelsList available Ollama models
GET/api/phone/elements/<device>Get interactive UI elements
POST/api/phone/tapExecute tap on device
POST/api/phone/typeType text on device
POST/api/phone/backPress back button