Introduction
Navvy is an AI-powered browser automation extension. A user enters a task in the side panel, Navvy scans the active page, sends a compact browser state to an OpenAI-compatible model, and executes the returned tool call against the live DOM.
The project exists to make complex web interfaces easier to operate. That includes accessibility use cases, repetitive navigation, and workflows where the user knows the outcome but does not want to manually traverse every control.
Current extension shape
Section titled “Current extension shape”Navvy is built with WXT and React 19 as a Manifest V3 extension. The generated extension uses these entrypoints:
entrypoints/sidepanel/App.tsx: React chat UI, history views, settings links, and task submission.entrypoints/background.ts: tab orchestration, side-panel behavior, auth token setup, and runtime message routing.entrypoints/content.ts: page-side controller initialization, DOM automation, and page bridge injection.entrypoints/main-world.ts: optional page API bridge exposed asPAGE_AGENT_EXTafter token validation.
These contexts communicate with Chrome runtime messages, a tab-events port, and a window.postMessage bridge for page integration.
What is implemented today
Section titled “What is implemented today”- Side panel composer with status labels
Ready,Running,Done,Error, andStopped. - DOM detection through a flat tree of visible interactive elements.
- Indexed actions such as
click_element_by_index,input_text,scroll, andopen_new_tab. - OpenAI-compatible provider profiles with configurable
baseURL,model, and optionalapiKey. - IndexedDB history sessions with rerun, export, and delete actions.
- Hub WebSocket bridge for external JSON task execution.
Planned or documentation pending
Section titled “Planned or documentation pending”- Persistent workflow recording and variable replay.
- Full voice input with STT, TTS, and interruption support.
- A standalone FastAPI backend and SSE stream are not present in the current repository.
Start with the Quick Start to run a first task, then read Architecture Overview for the message flow.