Skip to content

Action Engine

The action engine is the core agent loop plus a map of schema-bound tools. The LLM does not emit free-form instructions. It must call AgentOutput with reflection fields and one action object.

Each step uses this structure:

interface MacroToolInput {
evaluation_previous_goal?: string
memory?: string
next_goal?: string
action: Record<string, unknown>
}

Example:

{
"evaluation_previous_goal": "The page is visible.",
"memory": "The search input is index 4.",
"next_goal": "Click the search input.",
"action": {
"click_element_by_index": { "index": 4 }
}
}
ToolParametersStatus
done{ text: string, success: boolean }Implemented
wait{ seconds: number }, 1 to 10 secondsImplemented
click_element_by_index{ index: number }Implemented
input_text{ index: number, text: string }Implemented
select_dropdown_option{ index: number, text: string }Implemented
scroll{ down: boolean, num_pages?: number, pixels?: number, index?: number }Implemented
scroll_horizontally{ right: boolean, pixels: number, index?: number }Implemented
open_new_tab{ url: string }Implemented in extension tab tools
switch_to_tab{ tab_id: number }Implemented in extension tab tools
close_tab{ tab_id: number }Implemented in extension tab tools
ask_user{ question: string }Disabled unless an onAskUser callback is configured
execute_javascript{ script: string }Experimental; removed unless experimentalScriptExecutionTool is enabled

Planned only in the current source: keyboard shortcut actions, file upload, browser back navigation, and structured data extraction.

Indexed DOM Actions are the current DOM execution model. The LLM chooses tools such as click_element_by_index and passes a numeric index; the content-side page controller resolves that index against the latest scanned DOM map before executing the action.

PageController.updateTree() calls dom.getFlatTree() with doHighlightElements: true. Interactive elements receive highlightIndex values. The content script constructs the page controller with enableMask: false, but it still polls chrome.storage.local for:

agentHeartbeat
isAgentRunning
currentTabId

When the active tab is controlled and the heartbeat is fresh, the content script initializes the simulator mask and calls showMask(). Pointer animation is emitted through window events for moving, clicking, enabling pass-through, and disabling pass-through.

clickElement() scrolls the element into view, moves the pointer, performs a synthetic pointer/mouse sequence, focuses the element, then calls target.click().

Core DOM tools are registered in packages/core/src/tools/index.ts with:

tool({
description: '...',
inputSchema: z.object({...}),
execute: async function (input) {
return '...'
},
})

Extension tab tools are registered in packages/extension/src/agent/tabTools.ts and injected through the multi-page agent constructor.

If a new tool needs DOM access, add a PageController method first, expose it through RemotePageController, route it in RemotePageController.background.ts, and handle it in RemotePageController.content.ts.