AI Agent Browser Extension for Enhanced Web Accessibility and Interaction

Navvy controls the web from inside your browser.

Secure, local automation between natural language and web interfaces. Navvy runs as an MV3 extension with a React side panel, indexed DOM actions, and OpenAI-compatible provider profiles.

The problem

Websites need precise clicks. People should not.

!

The semantic gap

Users speak naturally, but websites require exact field focus, clicks, and tab state.

×

Headless agents fail

Automation outside the real browser loses extension context, tab state, and user control.

Extension-first control

Navvy acts in the user browser through the side panel, background worker, and content script.

System architecture

WXT entrypoints connected by Chrome runtime messages.

The source currently uses Chrome runtime messages, a tab-events port, and a main-world window bridge. No separate messaging library is present in the extension source.

Three isolated browser extension contextsSide panel, background service worker, content script, and main-world bridge communicate through Chrome runtime messages and window postMessage events.sidepanel/App.tsxReact command UIbackground.tsMessage routercontent.tsDOM executorTAB_CONTROLPAGE_CONTROLmain-world.ts bridgePAGE_AGENT_EXT_REQUEST / RESPONSE

sidepanel/App.tsx

React command UI with StatusDot, Composer, HistoryList, ActivityCard, and EventCard.

background.ts

Routes TAB_CONTROL and PAGE_CONTROL messages and opens the Hub tab for OPEN_HUB.

content.ts

Initializes RemotePageController.content and optionally injects main-world.ts.

Typed actions

Every browser move is an explicit tool call.

The LLM must call the single AgentOutput function with reflection fields and exactly one action object. Extension tab tools are injected alongside DOM tools.

  • done, wait, click_element_by_index, input_text, select_dropdown_option
  • scroll, scroll_horizontally, open_new_tab, switch_to_tab, close_tab
  • execute_javascript is available only when the experimental script tool is enabled.
{
  "evaluation_previous_goal": "The page is visible.",
  "memory": "Search input is index 4.",
  "next_goal": "Click the search input.",
  "action": {
    "click_element_by_index": { "index": 4 }
  }
}

DOM inspect

Interactive elements are selected by visibility and index.

The scanner targets native controls, ARIA roles, contenteditable regions, scrollable containers, event-handler attributes, and elements with interactive cursor styles.

  • Visibility uses offsetWidth, offsetHeight, computed visibility, and computed display.
  • Top-element checks sample elementFromPoint at the center and two corners.
  • Scrollable containers are emitted with data-scrollable distances.
*[0]<input type=search placeholder=Search />
[1]<button aria-label=Submit>Search />
[2]<div role=listbox data-scrollable="top=0, bottom=420" />

Secure by design

Local settings. Guarded execution. Human-visible boundaries.

The implemented settings store provider profiles, API keys, language settings, and advanced options in chrome.storage.local. Automation runs through validated extension messages and a content script that filters page state before action.

  • About settings expose a User Auth Token for trusted page integration.
  • Content scripts are blocked on chrome, extension, about, file, devtools, and similar URLs.
  • Hub WebSocket sessions require approval unless auto-approval is enabled.
Local settings
chrome.storage.local:
{
  "llmConfig": {
    "baseURL": "...",
    "model": "...",
    "apiKey": "..."
  },
  "llmProfiles": [...]
}

Provider profiles, API keys, language settings, and advanced options stay in the extension's local browser storage.

Execution boundaries
  • Content script runs on visible page state after DOM filtering.
  • Restricted browser URLs are blocked before automation starts.
  • Page bridge access requires the user auth token handshake.
  • External Hub sessions request approval by default.

Provider routing

The extension makes the model call directly.

The extension constructs OpenAI-compatible requests directly from the configured provider profile. The only endpoint shape in source is {baseURL}/chat/completions with POST.

  • Request fields include model, temperature, messages, tools, parallel_tool_calls, and tool_choice.
  • Hub mode is WebSocket-based: execute and stop inbound; ready, result, and error outbound.
  • Provider routing is configured through extension profiles instead of a bundled service.
POST {baseURL}/chat/completions
{
  "model": "gpt-4o-mini",
  "messages": [...],
  "tools": [{ "type": "function", "function": {...} }],
  "parallel_tool_calls": false,
  "tool_choice": { "type": "function", "function": { "name": "AgentOutput" } }
}

Key features

Extension-first automation with traceable execution.

MV3 WXT Extension

Implemented01

Built with WXT, React 19, a side panel UI, a background service worker, and a content script.

Indexed DOM Actions

Implemented02

Uses highlighted element indexes from PageController selectorMap, not persistent CSS selectors.

OpenAI-Compatible Providers

Implemented03

Calls {baseURL}/chat/completions with provider profiles stored in chrome.storage.local.

History Sessions

Implemented04

Completed and failed runs are saved to IndexedDB sessions and can be rerun or exported as JSON.

Hub WebSocket

Experimental05

hub.html?ws=PORT connects to ws://localhost:{PORT} for external task execution.

Guarded Page Bridge

Implemented06

main-world.ts is injected only after the page-side token matches the extension auth token.

Voice pipeline

The current side panel includes the control surface for voice, but speech recognition, speech output, and interruption support are planned rather than shipped.

Planned

Built in the open.

Navvy is open source. No black boxes, no vendor lock-in. Inspect the code and own your runtime choices.

GitHub