Realtime TTS
Streaming text-to-speech with sub-second first-audio latency. Built for voice agents and live applications.
How Streaming TTS Works
1. Send Text
POST text to /v1/tts/stream/ as a Server-Sent Events request.
2. Model Generates
Kokoro chunks the text and generates audio sample-by-sample on the GPU.
3. Stream Chunks
Base64-encoded WAV chunks arrive over SSE and start playing immediately.
4. Listen Live
User hears the start of the sentence in under a second, even on long inputs.
موارد استفاده
Where sub-second latency unlocks new experiences.
Voice Agents
Conversational bots that respond as fast as a human would.
Live Dubbing
Translate and dub a stream in real time without buffering pauses.
Games
NPC dialog that reacts to player choices instantly, no pre-rendered VO.
Accessibility
Screen readers and assistive tools that start speaking the moment a user clicks.
Realtime TTS Plans
شروع مجانی، ارتقاء وقتی که بیشتر نیاز دارید
- Kokoro streaming (free model)
- 500 characters per generation
- 10 free streams/day per anonymous user
- Sub-second first-audio latency
- SSE streaming over HTTPS
- 15,000 characters at signup
- 5,000 chars per stream
- API key for programmatic access
- Generation history
- No daily stream cap
- MOSS-TTS-Realtime (when live)
- 100,000 chars per stream
- Priority GPU queue
- Voice agent + Twilio integration
- Higher rate limits
پرسشهای متداول
چه چیزی میتونیم بهتر کنیم؟ بازخورد شما به ما کمک میکنه مشکلات رو حل کنیم.
Stream Speech in Real Time
Free for the first 10 generations a day. Sign up to unlock the full character allowance and API access.