Teata veast / Omaduse päring

VibeVoice TTS

Microsoft's multi-speaker long-form model that generates up to 90 minutes with 4 distinct speakers.

Tekst
Failid

0/500 märgid · Registreeruge 5000 generatsiooni kohta →

Registreeru 5000 tähemärgi piir

SSML režiim (Kõnesünteesi märkimiskeel suurepäraseks kontrolliks)

SSML-i siltidesse teksti segamine täpseks kontrollimiseks:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emotsiooni / stiili sildid

Sildid valitud mudelil mõistavad ~ klõpsa ühe kukutamiseks teksti, kus see juhtub:

Hääldussõnastik

Kohandatud häälduste määramine (sõna = hääldus):

PitchCity in Newfoundland Canada 0

-12 +12

AI mudel

Hääl

Keel

Väljundi vorming

Kiirus 1.0x

0.5x 2.0x

Tasuta Piper, VITS, MeloTTS

Siin ilmub sinu loodud heli. Vali mudel, sisesta tekst ja klõpsa Genereeri.

Info VibeVoice

VibeVoice from Microsoft is built for long-form, multi-speaker audio. Its 1.5B model can generate up to 90 minutes of speech with as many as 4 simultaneous speakers, using speaker tags to drive multi-turn dialogue — a strong fit for podcasts, audiobooks, and conversations that need speaker consistency across long passages. A separate Realtime 0.5B variant reaches roughly 300ms latency for interactive use. On TTS.ai it covers English and Chinese and accepts up to 50,000 characters per request, so an entire episode can be scripted in one pass.

Parim: Podcasts, dialogues, long-form narration, multi-speaker content

Kõigi sirvimine VibeVoice hääled

Põgusalt

Arendaja: Microsoft
Litsents: MIT
Määramistasand: standard
Kiirus: fast
Hääle kloonimine: Ei.
Keeled: English, Chinese
Maks. märgid: 50000

VibeVoice hääled

Speaker 1

English

Standardne Neutral

Speaker 1 (Chinese)

Chinese

Standardne Neutral

Speaker 2

English

Standardne Neutral

Speaker 2 (Chinese)

Chinese

Standardne Neutral

Speaker 3

English

Standardne Neutral

Speaker 4

English

Standardne Neutral

VibeVoice TTS (KKK)

VibeVoice supports up to 4 distinct speakers and up to 90 minutes of continuous output, with speaker tags for multi-turn dialogue — built for podcasts and long-form narration. It accepts up to 50,000 characters per request.

Yes. Alongside the 1.5B long-form model, a Realtime 0.5B variant achieves roughly 300ms latency for interactive use.

VibeVoice is MIT-licensed. It supports English and Chinese and does not currently support voice cloning.

← Kõik hääled

VibeVoice TTS

Armastus TTS.ai?

Info VibeVoice

Põgusalt

VibeVoice hääled

Speaker 1

Speaker 1 (Chinese)

Speaker 2

Speaker 2 (Chinese)

Speaker 3

Speaker 4

VibeVoice TTS (KKK)

How many speakers and how much audio can VibeVoice generate?

Does VibeVoice have a low-latency mode?

Is VibeVoice free for commercial use?