Realtime TTS
Streaming text-to-speech with sub-second first-audio latency. Built for voice agents and live applications.
How Streaming TTS Works
1. Send Text
POST text to /v1/tts/stream/ as a Server-Sent Events request.
2. Model Generates
Kokoro chunks the text and generates audio sample-by-sample on the GPU.
3. Stream Chunks
Base64-encoded WAV chunks arrive over SSE and start playing immediately.
4. Listen Live
User hears the start of the sentence in under a second, even on long inputs.
കേസുകള് ഉപയോഗിക്കുക
Where sub-second latency unlocks new experiences.
Voice Agents
Conversational bots that respond as fast as a human would.
Live Dubbing
Translate and dub a stream in real time without buffering pauses.
Games
NPC dialog that reacts to player choices instantly, no pre-rendered VO.
Accessibility
Screen readers and assistive tools that start speaking the moment a user clicks.
Realtime TTS Plans
നിങ്ങള്ക്ക് കൂടുതല് ആവശ്യമുണ്ടെങ്കില് സ്വതന്ത്രമായി തുടങ്ങുക, അപ്ഗ്രേഡ് ചെയ്യുക
- Kokoro streaming (free model)
- 500 characters per generation
- 10 free streams/day per anonymous user
- Sub-second first-audio latency
- SSE streaming over HTTPS
- 15,000 characters at signup
- 5,000 chars per stream
- API key for programmatic access
- Generation history
- No daily stream cap
- MOSS-TTS-Realtime (when live)
- 100,000 chars per stream
- Priority GPU queue
- Voice agent + Twilio integration
- Higher rate limits
പലപ്പോഴും ചോദിക്കപ്പെടുന്ന ചോദ്യങ്ങൾ
നിങ്ങളുടെ പ്രതികരണം പ്രശ്നങ്ങൾ പരിഹരിക്കാൻ നമ്മെ സഹായിക്കുന്നു.
Stream Speech in Real Time
Free for the first 10 generations a day. Sign up to unlock the full character allowance and API access.