Pocket TTS TTS
A compact 100M-parameter CPU model from Kyutai (makers of Moshi) with single-sample voice cloning.
Pocket TTS comes from Kyutai, the lab behind the Moshi speech model, and is built around a transformer paired with the Mimi codec. At just 100M parameters it runs efficiently on CPU, yet it still supports zero-shot voice cloning from a single audio sample — an unusual feature at this size. It covers English and French and handles up to 1,000 characters per request at fast (~2s) speeds. The small footprint and ~1GB VRAM make it a natural fit for edge deployment and low-resource or CPU-only environments where quick voice cloning is needed.
At a glance
- Developer
- Kyutai
- License
- MIT
- Tier
- free
- Speed
- fast
- Voice cloning
- Yes
- Languages
- English, French
- Max characters
- 1000
Pocket TTS AI Voices
Best for
Lightweight deployment, CPU-only environments, quick voice cloning