Clone any voice.
One API call.

The real-time voice API built for OpenAI developers. Clone a voice, start a conversation — bidirectional audio over WebSocket. Your users speak, your AI responds in any voice, in real-time. One API, zero complexity.

app.ts

import { VoiceMode } from 'voicemode';

// Clone a voice from audio

const voice = await VoiceMode.clone({

  audio: './sample.wav',

  name: 'Sarah'

});

// Start a real-time conversation

const session = await VoiceMode.converse({

  voice: voice.id

});

// Connect via WebSocket — speak, listen, repeat

const ws = new WebSocket(session.websocket_url);

<90ms

Time to first audio

15s

Audio to clone

1 call

To create a voice

OpenAI

Native integration

How it works

Three lines of code.
Your voice, everywhere.

🎙

Upload a sample

Send 15 seconds of any voice. Our engine extracts the unique characteristics, tone, and cadence. No model training required.

⚡

Get a voice ID

Instantly receive a unique voice identifier. Use it across all your applications. Clone as many voices as you need.

🔊

Talk in real-time

Start a conversation over WebSocket. Your users speak, the AI responds in the cloned voice — bidirectional, real-time. Or use simple TTS for text-to-speech.

Why VoiceMode

Built for developers who
already use OpenAI.

Speed

Bidirectional voice

Real-time conversations over WebSocket. Users speak, AI responds in cloned voices. Server-side VAD, streaming audio, live transcription. Built for voice agents.

Simple

One SDK, one bill

Stop juggling OpenAI for language and ElevenLabs for voice. VoiceMode plugs directly into your OpenAI workflow. TypeScript and Python SDKs.

Quality

15 seconds to clone

Upload a short audio clip. Get a voice that captures tone, accent, and personality. No hours of training data. No fine-tuning queue.

Pricing

Pay per character

Simple usage-based pricing. No credit systems, no confusing tiers. You use it, you pay for what you used. Scale to millions of characters.

Your app already
thinks with OpenAI.
Now it speaks.

VoiceMode is the missing voice layer for the OpenAI ecosystem. Custom voices, real-time streaming, and the simplest API in the market.

Clone any voice.One API call.

Three lines of code.Your voice, everywhere.