The semantic gap
Users speak naturally, but websites require exact field focus, clicks, and tab state.
AI Agent Browser Extension for Enhanced Web Accessibility and Interaction
Secure, local automation between natural language and web interfaces. Navvy runs as an MV3 extension with a React side panel, indexed DOM actions, and OpenAI-compatible provider profiles.
The problem
Users speak naturally, but websites require exact field focus, clicks, and tab state.
Automation outside the real browser loses extension context, tab state, and user control.
Navvy acts in the user browser through the side panel, background worker, and content script.
System architecture
The source currently uses Chrome runtime messages, a tab-events port, and a main-world window bridge. No separate messaging library is present in the extension source.
React command UI with StatusDot, Composer, HistoryList, ActivityCard, and EventCard.
Routes TAB_CONTROL and PAGE_CONTROL messages and opens the Hub tab for OPEN_HUB.
Initializes RemotePageController.content and optionally injects main-world.ts.
Typed actions
The LLM must call the single AgentOutput function with reflection fields and exactly one action object. Extension tab tools are injected alongside DOM tools.
{
"evaluation_previous_goal": "The page is visible.",
"memory": "Search input is index 4.",
"next_goal": "Click the search input.",
"action": {
"click_element_by_index": { "index": 4 }
}
} DOM inspect
The scanner targets native controls, ARIA roles, contenteditable regions, scrollable containers, event-handler attributes, and elements with interactive cursor styles.
*[0]<input type=search placeholder=Search /> [1]<button aria-label=Submit>Search /> [2]<div role=listbox data-scrollable="top=0, bottom=420" />
Secure by design
The implemented settings store provider profiles, API keys, language settings, and
advanced options in chrome.storage.local. Automation runs through
validated extension messages and a content script that filters page state before action.
chrome.storage.local:
{
"llmConfig": {
"baseURL": "...",
"model": "...",
"apiKey": "..."
},
"llmProfiles": [...]
}Provider profiles, API keys, language settings, and advanced options stay in the extension's local browser storage.
Provider routing
The extension constructs OpenAI-compatible requests directly from the configured provider
profile. The only endpoint shape in source is {baseURL}/chat/completions
with POST.
POST {baseURL}/chat/completions
{
"model": "gpt-4o-mini",
"messages": [...],
"tools": [{ "type": "function", "function": {...} }],
"parallel_tool_calls": false,
"tool_choice": { "type": "function", "function": { "name": "AgentOutput" } }
} Key features
Built with WXT, React 19, a side panel UI, a background service worker, and a content script.
Uses highlighted element indexes from PageController selectorMap, not persistent CSS selectors.
Calls {baseURL}/chat/completions with provider profiles stored in chrome.storage.local.
Completed and failed runs are saved to IndexedDB sessions and can be rerun or exported as JSON.
hub.html?ws=PORT connects to ws://localhost:{PORT} for external task execution.
main-world.ts is injected only after the page-side token matches the extension auth token.
The current side panel includes the control surface for voice, but speech recognition, speech output, and interruption support are planned rather than shipped.
Navvy is open source. No black boxes, no vendor lock-in. Inspect the code and own your runtime choices.