Introduction

Navvy is an AI-powered browser automation extension. A user enters a task in the side panel, Navvy scans the active page, sends a compact browser state to an OpenAI-compatible model, and executes the returned tool call against the live DOM.

The project exists to make complex web interfaces easier to operate. That includes accessibility use cases, repetitive navigation, and workflows where the user knows the outcome but does not want to manually traverse every control.

Current extension shape

Navvy is built with WXT and React 19 as a Manifest V3 extension. The generated extension uses these entrypoints:

entrypoints/sidepanel/App.tsx: React chat UI, history views, settings links, and task submission.
entrypoints/background.ts: tab orchestration, side-panel behavior, auth token setup, and runtime message routing.
entrypoints/content.ts: page-side controller initialization, DOM automation, and page bridge injection.
entrypoints/main-world.ts: optional page API bridge exposed as PAGE_AGENT_EXT after token validation.

These contexts communicate with Chrome runtime messages, a tab-events port, and a window.postMessage bridge for page integration.

What is implemented today

Side panel composer with status labels Ready, Running, Done, Error, and Stopped.
DOM detection through a flat tree of visible interactive elements.
Indexed actions such as click_element_by_index, input_text, scroll, and open_new_tab.
OpenAI-compatible provider profiles with configurable baseURL, model, and optional apiKey.
IndexedDB history sessions with rerun, export, and delete actions.
Hub WebSocket bridge for external JSON task execution.

Planned or documentation pending

Persistent workflow recording and variable replay.
Full voice input with STT, TTS, and interruption support.
A standalone FastAPI backend and SSE stream are not present in the current repository.

Start with the Quick Start to run a first task, then read Architecture Overview for the message flow.