Clone any voice.
One API call.
The real-time voice API built for OpenAI developers. Clone a voice, start a conversation — bidirectional audio over WebSocket. Your users speak, your AI responds in any voice, in real-time. One API, zero complexity.
// Clone a voice from audio
const voice = await VoiceMode.clone({
audio: './sample.wav',
name: 'Sarah'
});
// Start a real-time conversation
const session = await VoiceMode.converse({
voice: voice.id
});
// Connect via WebSocket — speak, listen, repeat
const ws = new WebSocket(session.websocket_url);
Three lines of code.
Your voice, everywhere.
Upload a sample
Send 15 seconds of any voice. Our engine extracts the unique characteristics, tone, and cadence. No model training required.
Get a voice ID
Instantly receive a unique voice identifier. Use it across all your applications. Clone as many voices as you need.
Talk in real-time
Start a conversation over WebSocket. Your users speak, the AI responds in the cloned voice — bidirectional, real-time. Or use simple TTS for text-to-speech.
Built for developers who
already use OpenAI.
Bidirectional voice
Real-time conversations over WebSocket. Users speak, AI responds in cloned voices. Server-side VAD, streaming audio, live transcription. Built for voice agents.
One SDK, one bill
Stop juggling OpenAI for language and ElevenLabs for voice. VoiceMode plugs directly into your OpenAI workflow. TypeScript and Python SDKs.
15 seconds to clone
Upload a short audio clip. Get a voice that captures tone, accent, and personality. No hours of training data. No fine-tuning queue.
Pay per character
Simple usage-based pricing. No credit systems, no confusing tiers. You use it, you pay for what you used. Scale to millions of characters.
Your app already
thinks with OpenAI.
Now it speaks.
VoiceMode is the missing voice layer for the OpenAI ecosystem. Custom voices, real-time streaming, and the simplest API in the market.