MOSS-TTSD TTS

A 7B dialogue model that continues conversations from an audio prompt — up to five speakers and 60 minutes of coherent audio.

Text
Files

0/500 characters · Sign up for 5,000 per generation →

SSML Mode (Speech Synthesis Markup Language for fine control)

Wrap your text in SSML tags for precise control:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emotion / Style Tags

Tags the selected model understands — click to drop one into your text where it happens:

Pronunciation Dictionary

Define custom pronunciations (word = pronunciation):

Pitch 0

-12 +12

AI Model

Voice

Language

Output Format

Speed 1.0x

0.5x 2.0x

Free with Piper, VITS, MeloTTS

Your generated audio will appear here. Choose a model, enter text, and click Generate.

About MOSS-TTSD

MOSS-TTSD v1.0 from OpenMOSS is a 7-billion-parameter dialogue text-to-speech model that continues a conversation from a short audio prompt rather than reading isolated lines. It handles up to five simultaneous speakers via [S1]/[S2]-style tags, zero-shot voice cloning from 3-to-10-second references, and stretches of coherent multi-turn dialogue up to 60 minutes long. It is distinct from the OpenMOSS MOSS-TTS model — the TTSD variant is specialized for podcast, audiobook, and dubbing workflows where long, consistent conversational audio is the goal. Released under Apache 2.0, it needs around 12GB of VRAM given its size.

Best for: Podcasts, audiobooks, dubbed dialogue, conversational content with multiple voices

Browse all MOSS-TTSD voices

At a glance

Developer: OpenMOSS
License: Apache 2.0
Tier: standard
Speed: medium
Voice cloning: Yes
Languages: English, Chinese
Max characters: 5000

MOSS-TTSD voices

Default (Chinese)

Chinese

Standard Neutral

Default Speaker

English

Standard Neutral

MOSS-TTSD TTS — FAQ

Up to five simultaneous speakers, addressed via speaker tags like [S1] and [S2], with the ability to clone each voice from a short reference clip.

It can produce up to 60 minutes of coherent multi-turn dialogue, which is what makes it suited to full podcast episodes and audiobook chapters rather than short clips.

MOSS-TTSD is a dialogue-specialized variant that continues conversations from an audio prompt and targets podcast, audiobook, and dubbing workflows, whereas the base MOSS-TTS is a general single-voice synthesis model.

← All voices

MOSS-TTSD TTS

Love TTS.ai? Tell your friends!

About MOSS-TTSD

At a glance

MOSS-TTSD voices

Default (Chinese)

Default Speaker

MOSS-TTSD TTS — FAQ

How many speakers can MOSS-TTSD generate?

How long can MOSS-TTSD audio be?

How is MOSS-TTSD different from MOSS-TTS?