Voice Interaction

Voice is the primary input method in Rocky. This guide covers how voice interaction works.

How It Works

Rocky uses the OpenAI Realtime API to enable natural voice conversations. When you tap the voice button:

This creates a natural, low-latency conversation experience.

Voice interaction currently supports providers with realtime streaming APIs:

OpenAI Realtime — Full-featured, low-latency voice via Realtime API
GLM Realtime — Optimized for Chinese language via Zhipu AI realtime voice protocol

Other providers support text-based interaction only.

Speak naturally — Rocky understands conversational language
Be specific about tasks — "Send a message to John" works better than "Do the thing"
Voice works best in relatively quiet environments