Skip to content

MCP Server

The MCP server exposes the entire Android automation system as 38 tools any LLM agent can call. Supports MCP (Model Context Protocol) for Claude Code, Cursor, Codex CLI, and OpenClaw via stdio or HTTP, plus an OpenAPI-compatible REST layer for ChatGPT/GPT Actions.

LLM Agent Platforms
Claude Code · Cursor · Codex CLI · OpenClaw (MCP stdio or HTTP)
ChatGPT / GPT Actions (OpenAPI REST)
| |
v v
mcp_server.py FastAPI server
port 8002 port 5055
38 tools /api/* endpoints
| |
+--------+-----------+
v
Physical Android phones via ADB

File: gitd/mcp_server.py — single file, 38 tools.

These are two separate things:

  • MCP Tools — 38 primitive actions the agent calls directly (tap, screenshot, launch_app, etc.). This is what LLM agents interact with.
  • Ghost Skills — high-level reusable automations recorded in registry/ (TikTok upload, Instagram crawl, Gmail send). Think macros.

Four MCP tools bridge the two: list_skills, run_workflow, run_action, create_skill. They let an agent discover and run pre-built skills, or record new ones.

ToolDescription
get_screen_treeLLM-friendly indented UI hierarchy — primary tool for understanding the screen
get_elementsJSON array of UI elements with idx, text, bounds, clickable
screenshotFull screen as base64 PNG
screenshot_annotatedScreenshot with numbered element labels overlaid
screenshot_croppedZoom into a specific pixel region
get_screen_xmlRaw uiautomator XML — when you need exact attributes
get_phone_stateCurrent app, activity, keyboard state, focused element
classify_screenHeuristic: home / search / profile / dialog / error / loading
find_on_screenSearch for text — XML first, OCR fallback
ocr_screenFull screen OCR via RapidOCR — for WebViews, games, canvas
ocr_regionOCR a specific pixel region
ToolDescription
tapTap at pixel coords (x, y)
tap_elementTap element by idx from get_elements()
swipeSwipe/scroll between two points with duration
long_pressLong press — context menus, drag initiation
type_textType ASCII into focused field
type_unicodeType emoji / CJK / accented chars via ADBKeyboard broadcast
paste_textSet clipboard and paste into focused field in one call
press_backAndroid Back button
press_homeAndroid Home button
press_keyAny key event: ENTER, POWER, VOLUME_UP, KEYCODE_*
ToolDescription
launch_appOpen app by package name (handles disabled packages, ROM quirks)
open_cameraOpen camera in photo / video / selfie / selfie_video + optional timer (2s/3s/5s/10s)
launch_intentFull Android intent — URLs, deep links, share sheets, extras
search_appsFind installed app by name → returns package name
list_appsAll installed apps with human-readable names
ToolDescription
clipboard_getRead current clipboard
clipboard_setWrite to clipboard via Ghost portal
paste_textWrite clipboard + paste in one shot (preferred)
ToolDescription
get_notificationsAll active notifications as JSON
open_notificationsPull down notification shade
web_searchOpen search in best available browser
speak_textPhone speaks text aloud via TTS (works from PC and on-device)
list_devicesAll connected ADB devices with model names
toggle_overlayToggle numbered element overlay for visual debugging

Connect MCP to the Ghost Skills system in registry/.

ToolDescription
list_skillsDiscover installed skills with their actions and workflows
run_workflowRun a full skill workflow: run_workflow(device, "tiktok", "upload_video", params)
run_actionRun a single action: run_action(device, "tiktok", "open_app", {})
create_skillRecord a new reusable skill from a JSON step list
explore_appBFS crawl an app’s UI and return a state graph

Add .mcp.json to your project root:

{
"mcpServers": {
"android-agent": {
"command": "android-agent-mcp"
}
}
}

Or register globally:

Terminal window
claude mcp add android-agent android-agent-mcp
{
"mcpServers": {
"android-agent": {
"url": "http://localhost:8002/mcp"
}
}
}

Start the server first: python3 -m gitd.mcp_server

{
"mcpServers": {
"android-agent": {
"command": "python3",
"args": ["-m", "gitd.mcp_server"]
}
}
}

ChatGPT uses OpenAPI, not MCP. Expose port 5055 via ngrok or Cloudflare Tunnel, then import the OpenAPI spec as a Custom GPT Action.

1. list_devices() → pick device serial
2. list_skills() → check if skill exists for task
3a. If skill: run_workflow(dev, skill, wf, {}) → done in one call
3b. If not: get_screen_tree(dev) → understand screen
tap / type / swipe / ... → act
get_screen_tree(dev) → verify
create_skill(name, pkg, steps) → save for next time
Terminal window
# Verify tools load
python3 -c "from gitd.mcp_server import mcp; print('OK')"
# List all tools via stdio
printf '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1"}}}\n{"jsonrpc":"2.0","id":2,"method":"tools/list"}\n' \
| android-agent-mcp | python3 -c "import sys,json; [print(t['name']) for line in sys.stdin.read().split('\n') for d in [json.loads(line)] if d.get('id')==2 for t in d['result']['tools']]"
# HTTP mode
python3 -m gitd.mcp_server &
curl -s http://localhost:8002/mcp -X POST \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'