Action Engine
The action engine is the core agent loop plus a map of schema-bound tools. The LLM does not emit free-form instructions. It must call AgentOutput with reflection fields and one action object.
Macro tool shape
Section titled “Macro tool shape”Each step uses this structure:
interface MacroToolInput { evaluation_previous_goal?: string memory?: string next_goal?: string action: Record<string, unknown>}Example:
{ "evaluation_previous_goal": "The page is visible.", "memory": "The search input is index 4.", "next_goal": "Click the search input.", "action": { "click_element_by_index": { "index": 4 } }}Implemented tool schemas
Section titled “Implemented tool schemas”| Tool | Parameters | Status |
|---|---|---|
done | { text: string, success: boolean } | Implemented |
wait | { seconds: number }, 1 to 10 seconds | Implemented |
click_element_by_index | { index: number } | Implemented |
input_text | { index: number, text: string } | Implemented |
select_dropdown_option | { index: number, text: string } | Implemented |
scroll | { down: boolean, num_pages?: number, pixels?: number, index?: number } | Implemented |
scroll_horizontally | { right: boolean, pixels: number, index?: number } | Implemented |
open_new_tab | { url: string } | Implemented in extension tab tools |
switch_to_tab | { tab_id: number } | Implemented in extension tab tools |
close_tab | { tab_id: number } | Implemented in extension tab tools |
ask_user | { question: string } | Disabled unless an onAskUser callback is configured |
execute_javascript | { script: string } | Experimental; removed unless experimentalScriptExecutionTool is enabled |
Planned only in the current source: keyboard shortcut actions, file upload, browser back navigation, and structured data extraction.
Indexed DOM Actions
Section titled “Indexed DOM Actions”Indexed DOM Actions are the current DOM execution model. The LLM chooses tools such as click_element_by_index and passes a numeric index; the content-side page controller resolves that index against the latest scanned DOM map before executing the action.
Visual feedback
Section titled “Visual feedback”PageController.updateTree() calls dom.getFlatTree() with doHighlightElements: true. Interactive elements receive highlightIndex values. The content script constructs the page controller with enableMask: false, but it still polls chrome.storage.local for:
agentHeartbeatisAgentRunningcurrentTabIdWhen the active tab is controlled and the heartbeat is fresh, the content script initializes the simulator mask and calls showMask(). Pointer animation is emitted through window events for moving, clicking, enabling pass-through, and disabling pass-through.
clickElement() scrolls the element into view, moves the pointer, performs a synthetic pointer/mouse sequence, focuses the element, then calls target.click().
Adding a new tool
Section titled “Adding a new tool”Core DOM tools are registered in packages/core/src/tools/index.ts with:
tool({ description: '...', inputSchema: z.object({...}), execute: async function (input) { return '...' },})Extension tab tools are registered in packages/extension/src/agent/tabTools.ts and injected through the multi-page agent constructor.
If a new tool needs DOM access, add a PageController method first, expose it through RemotePageController, route it in RemotePageController.background.ts, and handle it in RemotePageController.content.ts.