Report Bug / Feature Request

Tortoise TTS TTS

A quality-first autoregressive model — slow, but among the most realistic open-source speech available.

Text
Files

0/500 characters · Sign up for 5,000 per generation →

SSML Mode (Speech Synthesis Markup Language for fine control)

Wrap your text in SSML tags for precise control:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emotion / Style Tags

Tags the selected model understands — click to drop one into your text where it happens:

Pronunciation Dictionary

Define custom pronunciations (word = pronunciation):

Pitch 0

-12 +12

AI Model

Voice

Language

Output Format

Speed 1.0x

0.5x 2.0x

Free with Piper, VITS, MeloTTS

Your generated audio will appear here. Choose a model, enter text, and click Generate.

About Tortoise TTS

Tortoise TTS, created by James Betker, deliberately trades speed for quality. It is an autoregressive multi-voice system using a DALL-E-inspired architecture, and it produces some of the most realistic synthetic speech in the open-source ecosystem, with excellent prosody and speaker similarity. The name is a nod to its pace: it is noticeably slower than most alternatives, but the payoff is studio-grade output. It supports multiple voices and voice cloning (which benefits from a longer reference, around fifteen seconds), making it a long-standing favorite for audiobooks and premium narration where wait time is acceptable. Tortoise is English-focused and released under the permissive Apache 2.0 license.

Best for: Audiobooks, premium content, quality-first applications

Browse all Tortoise TTS voices

At a glance

Developer: James Betker
License: Apache 2.0
Tier: premium
Speed: slow
Voice cloning: Yes
Languages: English
Max characters: 2000

Tortoise TTS voices

Random

English

Premium Neutral

Tortoise TTS TTS — FAQ

It is autoregressive and uses a DALL-E-inspired architecture that deliberately prioritizes quality over speed. The trade-off is some of the most realistic open-source speech available, which is why it remains popular for audiobooks despite the wait.

Yes. It supports multi-voice synthesis and voice cloning; results improve with a longer reference, around fifteen seconds of clean audio.

Quality-first applications — audiobooks and premium narration — where its slow but highly realistic output is worth the generation time. It is English-focused and Apache 2.0 licensed.

← All voices

Tortoise TTS TTS

Love TTS.ai? Tell your friends!

About Tortoise TTS

At a glance

Tortoise TTS voices

Random

Tortoise TTS TTS — FAQ

Why is Tortoise TTS so slow?

Can Tortoise TTS clone a voice?

What is Tortoise TTS best for?