Chatterbox

Chatterbox TTS

Resemble AI's state-of-the-art zero-shot voice cloning model with independent emotion control.

Chatterbox by Resemble AI is a leading open-source zero-shot voice cloning model that replicates a voice from a single audio sample, capturing not just timbre but speaking style and emotional nuance. Its distinctive feature is fine-grained emotion control that operates independently of the voice identity, so you can keep a cloned voice but shift its emotional intensity. Built around ResembleEnhance and flow matching, it targets professional-grade cloning for content creation, dubbing, and character voices. Released under the permissive MIT license, Chatterbox has become a popular foundation for derivative models — TTS.ai also runs a Saudi-Arabic fine-tune of it. It favors quality, with a modest per-request character limit.

At a glance

Developer
Resemble AI
License
MIT
Tier
premium
Speed
medium
Voice cloning
Yes
Languages
English
Max characters
300

Chatterbox AI Voices

Default

English
Premium Neutral
Nggunakake

Best for

Professional voice cloning with emotional control, content creation

Chatterbox TTS — FAQ

A single audio sample is enough. Chatterbox is a zero-shot cloning model, so it captures a voice — including its style and emotional nuance — from one reference clip without any fine-tuning.

It offers fine-grained emotion control that works independently from the voice identity, letting you adjust the emotional tone of the output while keeping the same cloned voice.

Yes. Chatterbox is released by Resemble AI under the MIT license, which permits commercial use, and it serves as the base for several fine-tuned derivative models.
← All voices