Realtime TTS
Streaming text-to-speech with sub-second first-audio latency. Built for voice agents and live applications.
How Streaming TTS Works
1. Send Text
POST text to /v1/tts/stream/ as a Server-Sent Events request.
2. Model Generates
Kokoro chunks the text and generates audio sample-by-sample on the GPU.
3. Stream Chunks
Base64-encoded WAV chunks arrive over SSE and start playing immediately.
4. Listen Live
User hears the start of the sentence in under a second, even on long inputs.
استعمال کے حالات
Where sub-second latency unlocks new experiences.
Voice Agents
Conversational bots that respond as fast as a human would.
Live Dubbing
Translate and dub a stream in real time without buffering pauses.
Games
NPC dialog that reacts to player choices instantly, no pre-rendered VO.
Accessibility
Screen readers and assistive tools that start speaking the moment a user clicks.
Realtime TTS Plans
مفت شروع کریں، آپ کو مزید ضرورت ہو تو اپگریڈ کریں
- Kokoro streaming (free model)
- 500 characters per generation
- 10 free streams/day per anonymous user
- Sub-second first-audio latency
- SSE streaming over HTTPS
- 15,000 characters at signup
- 5,000 chars per stream
- API key for programmatic access
- Generation history
- No daily stream cap
- MOSS-TTS-Realtime (when live)
- 100,000 chars per stream
- Priority GPU queue
- Voice agent + Twilio integration
- Higher rate limits
بار بار پوچھے گئے سوالات
ہم کیا بہتر کر سکتے ہیں؟ آپ کا رائے ہمیں مسائل حل کرنے میں مدد کرتا ہے.
Stream Speech in Real Time
Free for the first 10 generations a day. Sign up to unlock the full character allowance and API access.