Realtime TTS
Streaming text-to-speech with sub-second first-audio latency. Built for voice agents and live applications.
How Streaming TTS Works
1. Send Text
POST text to /v1/tts/stream/ as a Server-Sent Events request.
2. Model Generates
Kokoro chunks the text and generates audio sample-by-sample on the GPU.
3. Stream Chunks
Base64-encoded WAV chunks arrive over SSE and start playing immediately.
4. Listen Live
User hears the start of the sentence in under a second, even on long inputs.
Tình huống sử dụng
Where sub-second latency unlocks new experiences.
Voice Agents
Conversational bots that respond as fast as a human would.
Live Dubbing
Translate and dub a stream in real time without buffering pauses.
Games
NPC dialog that reacts to player choices instantly, no pre-rendered VO.
Accessibility
Screen readers and assistive tools that start speaking the moment a user clicks.
Realtime TTS Plans
Bắt đầu miễn phí, nâng cấp khi cần nhiều hơn
- Kokoro streaming (free model)
- 500 characters per generation
- 10 free streams/day per anonymous user
- Sub-second first-audio latency
- SSE streaming over HTTPS
- 15,000 characters at signup
- 5,000 chars per stream
- API key for programmatic access
- Generation history
- No daily stream cap
- MOSS-TTS-Realtime (when live)
- 100,000 chars per stream
- Priority GPU queue
- Voice agent + Twilio integration
- Higher rate limits
Câu hỏi thường gặp
Chúng tôi có thể cải thiện gì? phản hồi của bạn giúp chúng tôi khắc phục vấn đề.
Stream Speech in Real Time
Free for the first 10 generations a day. Sign up to unlock the full character allowance and API access.