IndexTTS-2 TTS

A zero-shot TTS model with fine-grained emotion control via emotion vectors, no emotion-specific training data required.

0/500 نشان · 5000 جي هر نسل لاء رجسٽر →

رجسٽر ٿيو 5000 ڪارڪنن جي حد

SSML ريت (سيٽنگ ڪنٽرول لاءِ ڳالهائڻ جي سنٿسيٽ مارڪ اپ ٻوليName)

صحيح ڪنٽرول لاءِ پنھنجو متن SSML ٽيگ ۾ ويڙھيو:

<speak><prosody rate="slow">Slow speech</prosody></speak>

احساس / انداز ٽيگ

ٽيگ جيڪي چونڊيل ماڊل سمجھي ٿو - هڪ کي پنھنجي متن ۾ جتي ٿئي ٿو ڦيريڻ لاءِ ڪلڪ ڪريو:

پڙھڻ جي لغت

پنھنجو آواز بيان ڪريو (شيء = آواز):

پيچ 0

-12 +12

AI ماڊل

آواز

ٻولي

اخراج جي شڪل

رفتار 1.0x

0.5x 2.0x

پيپر، VITS، MeloTTS سان مفت

پنھنجو ٺاھيل آڊيو اتي نظر ايندو. ھڪ ماڊل چونڊيو، متن داخل ڪريو ۽ ٺاھڻ دٻايو.

بابت IndexTTS-2

IndexTTS-2, from the Index Team, is an expressive text-to-speech system that pairs zero-shot voice synthesis with precise emotional control. Rather than relying on emotion-labeled training data, it uses emotion vectors to dial in tones like happy, sad, angry, or fearful independently of the voice itself. Built on a Qwen2 backbone with BigVGAN as the vocoder, it supports English and Chinese and can clone a voice from roughly five seconds of reference audio. It suits audiobooks, virtual assistants, and any content where the same voice needs to shift emotional register. Its weights use the Bilibili Model License, which permits commercial use below large usage and revenue thresholds.

بهترين: Emotionally expressive content, audiobooks, virtual assistants

سڀ لکو IndexTTS-2 آواز

هڪ نظر ۾

ڊيولپر: Index Team
لائسنس: Bilibili Model License
جانور: standard
رفتار: medium
آواز جو کلون: ھائو
ٻوليون: English, Chinese
وڌيڪ نشان: 1000

IndexTTS-2 آواز

Chinese Default

Chinese

معياري Neutral

Default

English

معياري Neutral

IndexTTS-2 TTS - پڇا ڳاڇا

It uses emotion vectors that let you specify tones such as happy, sad, angry, or fearful without needing emotion-specific training data, and the emotional expression is controlled independently from the voice identity.

Yes. It performs zero-shot voice cloning from a short reference, typically around five seconds of audio, in English or Chinese.

Its weights are released under the Bilibili Model License, which allows commercial use for products below defined user and revenue thresholds. Larger deployments should review the license terms.

← سڀ آواز

IndexTTS-2 TTS

TTS.ai کي پيارو آهي؟ پنھنجن دوستن کي چئو!

بابت IndexTTS-2

هڪ نظر ۾

IndexTTS-2 آواز

Chinese Default

Default

IndexTTS-2 TTS - پڇا ڳاڇا

How does IndexTTS-2 control emotion?

Can IndexTTS-2 clone a voice?

Is IndexTTS-2 free for commercial use?