Realtime TTS
Streaming text-to-speech with sub-second first-audio latency. Built for voice agents and live applications.
How Streaming TTS Works
1. Send Text
POST text to /v1/tts/stream/ as a Server-Sent Events request.
2. Model Generates
Kokoro chunks the text and generates audio sample-by-sample on the GPU.
3. Stream Chunks
Base64-encoded WAV chunks arrive over SSE and start playing immediately.
4. Listen Live
User hears the start of the sentence in under a second, even on long inputs.
Колдонуу мисалдары
Where sub-second latency unlocks new experiences.
Voice Agents
Conversational bots that respond as fast as a human would.
Live Dubbing
Translate and dub a stream in real time without buffering pauses.
Games
NPC dialog that reacts to player choices instantly, no pre-rendered VO.
Accessibility
Screen readers and assistive tools that start speaking the moment a user clicks.
Realtime TTS Plans
Акысыз баштаңыз, керек болсо жаңыртыңыз
- Kokoro streaming (free model)
- 500 characters per generation
- 10 free streams/day per anonymous user
- Sub-second first-audio latency
- SSE streaming over HTTPS
- 15,000 characters at signup
- 5,000 chars per stream
- API key for programmatic access
- Generation history
- No daily stream cap
- MOSS-TTS-Realtime (when live)
- 100,000 chars per stream
- Priority GPU queue
- Voice agent + Twilio integration
- Higher rate limits
Көп берилүүчү суроолор
Биз эмнени жакшыртсак болот? Сиздин пикириңиз бизге көйгөйлөрдү чечүүгө жардам берет.
Stream Speech in Real Time
Free for the first 10 generations a day. Sign up to unlock the full character allowance and API access.