Voice Interaction

Voice interaction is planned. The current side panel UI includes a microphone button, but the source does not implement full-duplex audio, STT, TTS, or barge-in.

Current UI state

The composer supports text submission through the side panel. Its placeholder is:

Describe your task... (Enter to send)

The microphone control is present as a planned interaction surface. It is not connected to a speech recognition pipeline in the current implementation.

Implemented real-time bridge

The extension does include a Hub WebSocket client for external applications. That bridge accepts JSON task-control messages, not audio streams:

{ "type": "execute", "task": "Open a new tab", "config": {} }

The Hub client returns ready, result, and error messages to the external app.

Planned voice architecture

The intended voice path is still documentation pending. A future implementation may add:

Streaming speech-to-text.
Model planning from partial or final transcripts.
Text-to-speech output.
Client-side voice activity detection for interruption.
User confirmation before sensitive actions.

Until that work ships, use the text composer or the JSON Hub WebSocket bridge for external control.