IndexTTS-2 TTS

A zero-shot TTS model with fine-grained emotion control via emotion vectors, no emotion-specific training data required.

Testo
File

0/500 caratteri · Iscriviti per 5.000 per generazione →

Iscriviti per un limite di 5.000 caratteri

Modalità SSML (Linguaggio di marcatura sintesi vocale per un controllo fine)

Avvolgi il tuo testo nei tag SSML per un controllo preciso:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emozione / Tag stile

Tags il modello selezionato comprende clic su

Dizionario della pronuncia

Definire le pronunciazioni personalizzate (parola = pronuncia):

Piazzola 0

-12 +12

Modello AI

Voce

Lingua

Formato di output

Velocità 1.0x

0.5x 2.0x

Gratis con Piper, VITS, MeloTTS

L'audio generato apparirà qui. Scegli un modello, inserisci testo e fai clic su Genera.

Informazioni IndexTTS-2

IndexTTS-2, from the Index Team, is an expressive text-to-speech system that pairs zero-shot voice synthesis with precise emotional control. Rather than relying on emotion-labeled training data, it uses emotion vectors to dial in tones like happy, sad, angry, or fearful independently of the voice itself. Built on a Qwen2 backbone with BigVGAN as the vocoder, it supports English and Chinese and can clone a voice from roughly five seconds of reference audio. It suits audiobooks, virtual assistants, and any content where the same voice needs to shift emotional register. Its weights use the Bilibili Model License, which permits commercial use below large usage and revenue thresholds.

Meglio per: Emotionally expressive content, audiobooks, virtual assistants

Sfoglia tutti IndexTTS-2 voci

A colpo d'occhio

Sviluppatore: Index Team
Licenza: Bilibili Model License
Livello: standard
Velocità: medium
Clonazione vocale: Sì
Lingue: English, Chinese
Caratteri massimi: 1000

IndexTTS-2 voci

Chinese Default

Chinese

Standard Neutral

Default

English

Standard Neutral

IndexTTS-2 FAQ del TTS

It uses emotion vectors that let you specify tones such as happy, sad, angry, or fearful without needing emotion-specific training data, and the emotional expression is controlled independently from the voice identity.

Yes. It performs zero-shot voice cloning from a short reference, typically around five seconds of audio, in English or Chinese.

Its weights are released under the Bilibili Model License, which allows commercial use for products below defined user and revenue thresholds. Larger deployments should review the license terms.

← Tutte le voci

IndexTTS-2 TTS

Ti piace TTS.ai? Dillo ai tuoi amici!

Informazioni IndexTTS-2

A colpo d'occhio

IndexTTS-2 voci

Chinese Default

Default

IndexTTS-2 FAQ del TTS

How does IndexTTS-2 control emotion?

Can IndexTTS-2 clone a voice?

Is IndexTTS-2 free for commercial use?