IndexTTS-2 TTS
A zero-shot TTS model with fine-grained emotion control via emotion vectors, no emotion-specific training data required.
IndexTTS-2, from the Index Team, is an expressive text-to-speech system that pairs zero-shot voice synthesis with precise emotional control. Rather than relying on emotion-labeled training data, it uses emotion vectors to dial in tones like happy, sad, angry, or fearful independently of the voice itself. Built on a Qwen2 backbone with BigVGAN as the vocoder, it supports English and Chinese and can clone a voice from roughly five seconds of reference audio. It suits audiobooks, virtual assistants, and any content where the same voice needs to shift emotional register. Its weights use the Bilibili Model License, which permits commercial use below large usage and revenue thresholds.
A colpo d'occhio
- Sviluppatore
- Index Team
- Licenza
- Bilibili Model License
- Livello
- standard
- Velocità
- medium
- Clonazione vocale
- Sì
- Lingue
- English, Chinese
- Caratteri massimi
- 1000
IndexTTS-2 voci
Meglio per
Emotionally expressive content, audiobooks, virtual assistants