Dia TTS TTS
A 1.6B-parameter model purpose-built for generating natural multi-speaker dialogue, not just single-voice narration.
Dia by Nari Labs is a 1.6-billion-parameter text-to-speech model designed from the ground up for dialogue rather than monologue. It generates conversations between two speakers with realistic turn-taking, prosody, and emotional expression, producing audio that sounds like a real exchange instead of two voices read separately. Architecturally it pairs an autoregressive transformer with the Descript Audio Codec (DAC) for waveform generation. Dia is a strong fit for podcast-style content, scripted audiobook dialogue, and conversational scenes, and is released under Apache 2.0. Generations are heavier than single-voice models, so it favors quality over raw speed.
At a glance
- Developer
- Nari Labs
- License
- Apache 2.0
- Tier
- standard
- Speed
- medium
- Voice cloning
- No
- Languages
- English
- Max characters
- 800
Dia TTS voices
Best for
Podcasts, audiobook dialogues, conversational content