Tortoise TTS

Tortoise TTS TTS

A quality-first autoregressive model — slow, but among the most realistic open-source speech available.

Tortoise TTS, created by James Betker, deliberately trades speed for quality. It is an autoregressive multi-voice system using a DALL-E-inspired architecture, and it produces some of the most realistic synthetic speech in the open-source ecosystem, with excellent prosody and speaker similarity. The name is a nod to its pace: it is noticeably slower than most alternatives, but the payoff is studio-grade output. It supports multiple voices and voice cloning (which benefits from a longer reference, around fifteen seconds), making it a long-standing favorite for audiobooks and premium narration where wait time is acceptable. Tortoise is English-focused and released under the permissive Apache 2.0 license.

At a glance

Developer
James Betker
License
Apache 2.0
Tier
premium
Speed
slow
Voice cloning
Yes
Languages
English
Max characters
2000

Tortoise TTS AI Voices

Random

English
Premium Neutral
Fuula

Best for

Audiobooks, premium content, quality-first applications

Tortoise TTS TTS — FAQ

It is autoregressive and uses a DALL-E-inspired architecture that deliberately prioritizes quality over speed. The trade-off is some of the most realistic open-source speech available, which is why it remains popular for audiobooks despite the wait.

Yes. It supports multi-voice synthesis and voice cloning; results improve with a longer reference, around fifteen seconds of clean audio.

Quality-first applications — audiobooks and premium narration — where its slow but highly realistic output is worth the generation time. It is English-focused and Apache 2.0 licensed.
← All voices