Skip to content

Architecture Overview

Navvy is implemented as a WXT Manifest V3 extension. The source directory is packages/extension/src, with WXT entrypoints under src/entrypoints.

The extension config lives in packages/extension/wxt.config.js.

FieldCurrent value
srcDirsrc
WXT modules@wxt-dev/module-react
Manifest versionMV3 in generated manifest
Permissionstabs, tabGroups, sidePanel, storage
Host permissions<all_urls>
Content script matches<all_urls>
Content script timingdocument_end
Side panelsidepanel.html
Options pagesettings.html
Externally connectablehttp://localhost/*

The generated manifest name is Navvy and the description is AI-powered browser automation assistant. Control web pages with natural language.

  • entrypoints/sidepanel/App.tsx: React command UI. It renders StatusDot, Composer, HistoryList, EventCard, and ActivityCard.
  • entrypoints/settings/App.tsx: settings UI for General, Providers, Skills, Advanced, and About tabs.
  • entrypoints/background.ts: service worker. It creates the extension auth token, routes messages, opens Hub tabs, and enables side panel behavior.
  • entrypoints/content.ts: content script. It initializes the remote page controller and can expose the main-world bridge when the page token matches the extension token.
  • entrypoints/main-world.ts: unlisted script injected into the page world. It exposes window.PAGE_AGENT_EXT and window.PAGE_AGENT_EXT_VERSION.
  • entrypoints/hub/App.tsx and entrypoints/hub/hub-ws.ts: Hub UI and WebSocket client for external local callers.
sequenceDiagram
participant SidePanel as "sidepanel/App.tsx"
participant Core as "Multi-page agent"
participant Background as "background.ts"
participant Content as "content.ts"
participant Page as "PageController"
participant Provider as "OpenAI-compatible API"
SidePanel->>Core: execute(task)
Core->>Background: TAB_CONTROL get_active_tab
Background-->>Core: { success, tab }
Core->>Background: PAGE_CONTROL get_browser_state
Background->>Content: PAGE_CONTROL get_browser_state
Content->>Page: updateTree()
Page-->>Content: BrowserState
Content-->>Background: BrowserState
Background-->>Core: BrowserState
Core->>Provider: POST {baseURL}/chat/completions
Provider-->>Core: tool call AgentOutput
Core->>Background: PAGE_CONTROL click_element / input_text / scroll
Background->>Content: PAGE_CONTROL action
Content->>Page: indexed DOM action
Page-->>Content: ActionResult
Content-->>Core: ActionResult
Core-->>SidePanel: historychange / activity / statuschange
TypeDirectionAction namesPayload
TAB_CONTROLagent environment to background.tsget_active_tab, get_tab_info, open_new_tab, create_tab_group, update_tab_group, add_tab_to_group, close_tab, get_window_tabsDepends on action, for example { tabId }, { url }, { tabIds, windowId }, { groupId, properties }
PAGE_CONTROLagent environment to background.ts to content.tsget_my_tab_id, get_last_update_time, get_browser_state, update_tree, clean_up_highlights, click_element, input_text, select_option, scroll, scroll_horizontally, execute_javascript{ targetTabId, payload }; DOM actions pass positional arguments in payload
OPEN_HUBexternal localhost page to background.tsOpens or focuses hub.html?ws={wsPort}{ type: "OPEN_HUB", wsPort }

The agent connects with chrome.runtime.connect({ name: 'tab-events' }). The background broadcasts:

{ action: 'created', payload: { tab } }
{ action: 'removed', payload: { tabId, removeInfo } }
{ action: 'updated', payload: { tabId, changeInfo, tab } }

When the page’s local token matches the extension token, main-world.ts exposes:

window.PAGE_AGENT_EXT.execute(task, config)
window.PAGE_AGENT_EXT.stop()

The page world and content script exchange window.postMessage events:

{ channel: 'PAGE_AGENT_EXT_REQUEST', id, action: 'execute', payload: { task, config } }
{ channel: 'PAGE_AGENT_EXT_REQUEST', id, action: 'stop' }
{ channel: 'PAGE_AGENT_EXT_RESPONSE', id, action: 'status_change_event', payload }
{ channel: 'PAGE_AGENT_EXT_RESPONSE', id, action: 'activity_event', payload }
{ channel: 'PAGE_AGENT_EXT_RESPONSE', id, action: 'history_change_event', payload }
{ channel: 'PAGE_AGENT_EXT_RESPONSE', id, action: 'execute_result', payload?, error? }

hub.html?ws=PORT connects to ws://localhost:{PORT} as a WebSocket client.

Inbound caller messages:

{ type: 'execute', task: string, config?: object }
{ type: 'stop' }

Outbound Hub messages:

{ type: 'ready' }
{ type: 'result', success: boolean, data: string }
{ type: 'error', message: string }

There is no FastAPI backend or SSE transport in the current monorepo.