Skip to content

Voice Interaction

Voice interaction is planned. The current side panel UI includes a microphone button, but the source does not implement full-duplex audio, STT, TTS, or barge-in.

The composer supports text submission through the side panel. Its placeholder is:

Describe your task... (Enter to send)

The microphone control is present as a planned interaction surface. It is not connected to a speech recognition pipeline in the current implementation.

The extension does include a Hub WebSocket client for external applications. That bridge accepts JSON task-control messages, not audio streams:

{ "type": "execute", "task": "Open a new tab", "config": {} }

The Hub client returns ready, result, and error messages to the external app.

The intended voice path is still documentation pending. A future implementation may add:

  • Streaming speech-to-text.
  • Model planning from partial or final transcripts.
  • Text-to-speech output.
  • Client-side voice activity detection for interruption.
  • User confirmation before sensitive actions.

Until that work ships, use the text composer or the JSON Hub WebSocket bridge for external control.