Segnala bug / richiesta di funzionalità

VibeVoice TTS

Microsoft's multi-speaker long-form model that generates up to 90 minutes with 4 distinct speakers.

Testo
File

0/500 caratteri · Iscriviti per 5.000 per generazione →

Iscriviti per un limite di 5.000 caratteri

Modalità SSML (Linguaggio di marcatura sintesi vocale per un controllo fine)

Avvolgi il tuo testo nei tag SSML per un controllo preciso:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emozione / Tag stile

Tags il modello selezionato comprende clic su

Dizionario della pronuncia

Definire le pronunciazioni personalizzate (parola = pronuncia):

Piazzola 0

-12 +12

Modello AI

Voce

Lingua

Formato di output

Velocità 1.0x

0.5x 2.0x

Gratis con Piper, VITS, MeloTTS

L'audio generato apparirà qui. Scegli un modello, inserisci testo e fai clic su Genera.

Informazioni VibeVoice

VibeVoice from Microsoft is built for long-form, multi-speaker audio. Its 1.5B model can generate up to 90 minutes of speech with as many as 4 simultaneous speakers, using speaker tags to drive multi-turn dialogue — a strong fit for podcasts, audiobooks, and conversations that need speaker consistency across long passages. A separate Realtime 0.5B variant reaches roughly 300ms latency for interactive use. On TTS.ai it covers English and Chinese and accepts up to 50,000 characters per request, so an entire episode can be scripted in one pass.

Meglio per: Podcasts, dialogues, long-form narration, multi-speaker content

Sfoglia tutti VibeVoice voci

A colpo d'occhio

Sviluppatore: Microsoft
Licenza: MIT
Livello: standard
Velocità: fast
Clonazione vocale: No.
Lingue: English, Chinese
Caratteri massimi: 50000

VibeVoice voci

Speaker 1

English

Standard Neutral

Speaker 1 (Chinese)

Chinese

Standard Neutral

Speaker 2

English

Standard Neutral

Speaker 2 (Chinese)

Chinese

Standard Neutral

Speaker 3

English

Standard Neutral

Speaker 4

English

Standard Neutral

VibeVoice FAQ del TTS

VibeVoice supports up to 4 distinct speakers and up to 90 minutes of continuous output, with speaker tags for multi-turn dialogue — built for podcasts and long-form narration. It accepts up to 50,000 characters per request.

Yes. Alongside the 1.5B long-form model, a Realtime 0.5B variant achieves roughly 300ms latency for interactive use.

VibeVoice is MIT-licensed. It supports English and Chinese and does not currently support voice cloning.

← Tutte le voci

VibeVoice TTS

Ti piace TTS.ai? Dillo ai tuoi amici!

Informazioni VibeVoice

A colpo d'occhio

VibeVoice voci

Speaker 1

Speaker 1 (Chinese)

Speaker 2

Speaker 2 (Chinese)

Speaker 3

Speaker 4

VibeVoice FAQ del TTS

How many speakers and how much audio can VibeVoice generate?

Does VibeVoice have a low-latency mode?

Is VibeVoice free for commercial use?