Tortoise TTS TTS
A quality-first autoregressive model — slow, but among the most realistic open-source speech available.
Tortoise TTS, created by James Betker, deliberately trades speed for quality. It is an autoregressive multi-voice system using a DALL-E-inspired architecture, and it produces some of the most realistic synthetic speech in the open-source ecosystem, with excellent prosody and speaker similarity. The name is a nod to its pace: it is noticeably slower than most alternatives, but the payoff is studio-grade output. It supports multiple voices and voice cloning (which benefits from a longer reference, around fifteen seconds), making it a long-standing favorite for audiobooks and premium narration where wait time is acceptable. Tortoise is English-focused and released under the permissive Apache 2.0 license.
At a glance
- Developer
- James Betker
- License
- Apache 2.0
- Tier
- premium
- Speed
- slow
- Voice cloning
- Yes
- Languages
- English
- Max characters
- 2000
Tortoise TTS voices
Best for
Audiobooks, premium content, quality-first applications