Dia TTS

Dia TTS TTS

A 1.6B-parameter model purpose-built for generating natural multi-speaker dialogue, not just single-voice narration.

Dia by Nari Labs is a 1.6-billion-parameter text-to-speech model designed from the ground up for dialogue rather than monologue. It generates conversations between two speakers with realistic turn-taking, prosody, and emotional expression, producing audio that sounds like a real exchange instead of two voices read separately. Architecturally it pairs an autoregressive transformer with the Descript Audio Codec (DAC) for waveform generation. Dia is a strong fit for podcast-style content, scripted audiobook dialogue, and conversational scenes, and is released under Apache 2.0. Generations are heavier than single-voice models, so it favors quality over raw speed.

At a glance

Developer
Nari Labs
License
Apache 2.0
Tier
standard
Speed
medium
Voice cloning
No
Languages
English
Max characters
800

Dia TTS AI Voices

Speaker 1

English
Standard Neutral
Anvendelse

Speaker 2

English
Standard Neutral
Anvendelse

Best for

Podcasts, audiobook dialogues, conversational content

Dia TTS TTS — FAQ

Multi-speaker dialogue. Unlike most TTS models that read one voice at a time, Dia generates a two-speaker conversation with natural turn-taking, prosody, and emotion in a single pass — ideal for podcasts and scripted scenes.

It is a 1.6-billion-parameter model from Nari Labs, built on an autoregressive transformer with the Descript Audio Codec for audio generation.

On TTS.ai, Dia is configured for English. Its strength is dialogue generation rather than broad multilingual coverage.
← All voices