Tortoise TTS

Tortoise TTS TTS

A quality-first autoregressive model — slow, but among the most realistic open-source speech available.

Tortoise TTS, created by James Betker, deliberately trades speed for quality. It is an autoregressive multi-voice system using a DALL-E-inspired architecture, and it produces some of the most realistic synthetic speech in the open-source ecosystem, with excellent prosody and speaker similarity. The name is a nod to its pace: it is noticeably slower than most alternatives, but the payoff is studio-grade output. It supports multiple voices and voice cloning (which benefits from a longer reference, around fifteen seconds), making it a long-standing favorite for audiobooks and premium narration where wait time is acceptable. Tortoise is English-focused and released under the permissive Apache 2.0 license.

At a glance

Developer
James Betker
License
Apache 2.0
Tier
premium
Speed
slow
Voice cloning
Yes
Languages
English
Max characters
2000

Tortoise TTS AI Voices

Random

English
Àwọn ìṣàmúlò-ètò Neutral
Lo

Best for

Audiobooks, premium content, quality-first applications

Tortoise TTS TTS — FAQ

It is autoregressive and uses a DALL-E-inspired architecture that deliberately prioritizes quality over speed. The trade-off is some of the most realistic open-source speech available, which is why it remains popular for audiobooks despite the wait.

Yes. It supports multi-voice synthesis and voice cloning; results improve with a longer reference, around fifteen seconds of clean audio.

Quality-first applications — audiobooks and premium narration — where its slow but highly realistic output is worth the generation time. It is English-focused and Apache 2.0 licensed.
← All voices