Voice Interaction
Voice interaction is planned. The current side panel UI includes a microphone button, but the source does not implement full-duplex audio, STT, TTS, or barge-in.
Current UI state
Section titled “Current UI state”The composer supports text submission through the side panel. Its placeholder is:
Describe your task... (Enter to send)The microphone control is present as a planned interaction surface. It is not connected to a speech recognition pipeline in the current implementation.
Implemented real-time bridge
Section titled “Implemented real-time bridge”The extension does include a Hub WebSocket client for external applications. That bridge accepts JSON task-control messages, not audio streams:
{ "type": "execute", "task": "Open a new tab", "config": {} }The Hub client returns ready, result, and error messages to the external app.
Planned voice architecture
Section titled “Planned voice architecture”The intended voice path is still documentation pending. A future implementation may add:
- Streaming speech-to-text.
- Model planning from partial or final transcripts.
- Text-to-speech output.
- Client-side voice activity detection for interruption.
- User confirmation before sensitive actions.
Until that work ships, use the text composer or the JSON Hub WebSocket bridge for external control.