Akkaata gara-dhaabduu
Akkaataa gara yaadaa sammuutti jijjiira, moolaa AI fuula-afuraatiin. Fakkeenyaaf, akkaataa hin barbaachisu.
Daangeessii kitaaba keessan keessaa tag SSML akka itti fayyadamtan:
<speak><prosody rate="slow">Slow speech</prosody></speak>
Dabalatti, kan akka:
Haalli fuula
Fakkeenya Modelii
Kokoro
Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.
| Deebi'aa: | Hexgrad |
| Lizenz: | Apache 2.0 |
| Jijjiiramni | Fast |
| Qiyaasa: | |
| Afaan Oromoo | 8 Afaan Oromoo |
| VRAM | 1.5GB |
| Dhaabbilee | Hin deeggaramne |
Tips for Better Results
- Fuula ittisa-gaafatamaa akkanumatti fayyadama akkanumatti
- Eegifnee nootasonni fi itti-ga'iiwwan akka hin mul'atin
- Akkasumas, kamshaawwan akka fuula duraa tokkootti galchiin
- Fuulaa alatti (...) akka duraa
- Kokoro ykn CosyVoice 2 fayyadami akka argattu
- Dhiibbaa fayyadami akka walga'ii fi fakkaattota podcast-ka'ee
Akkasumas
| Daandiin | Akkasumas |
|---|---|
| Birrii | 1:1 (free) |
| Standartaa | 2x karaaktoota |
| Premium | 4x karaaktoota |
Akkamitti AI Teeksta-Gara-Haala-Kuni-Hojjaa
Jijjiirraa dhaamsa walfakkaatu sadii saffisa. Onnee teknoolojii hin barbaachisu.
Galchiin Teessuma
Type, paste, or upload the text you want to convert to speech. Supports up to 5,000 characters per generation for free accounts, or 100,000 for paid plans. Use plain text or add SSML tags for advanced control over pronunciation, pauses, and emphasis.
Suuraa Modaa fi Dhaada
20+ AI modeelii keessaa tokko keessaa tokko filadhu. 20+ AI modeelii keessaa tokko keessaa tokko filadhu. 20+ AI modeelii keessaa tokko keessaa tokko filadhu. 20+ AI modeelii keessaa tokko keessaa tokko filadhu. 20+ AI modeelii keessaa tokko keessaa tokko filadhu. 20+ AI modeelii keessaa tokko keessaa tokko filadhu. 20+ AI modeelii keessaa tokko keessaa tokko filadhu.
Jijjiiramnii fi Ibsa
Cuqaasi Dhiibu fi Oduu keessan yoo ta'e sekondii tokkoon booda. Akkasumas, qabduu dabalataa, dabalataa furmaata barbaadde, ykn kophii galmee wal-ga'ii. Akkasumas, API'n akka itti fayyadamtu fi akka wal-ga'ii keessanitti akka galmeessan fayyadami.
Akkaataa gara-dhaabduu
Akkaataa-hiika-dhaamsa AI-barreeffame kan namatti fidu, kan nama barbaachisu, fi kan walqabatee kan namatti fidu, kan walqabatee, kan walqabatee, fi kan walqabatee, kan walqabatee, fi kan walqabatee.
Akkaata-hiika-dhaamsa
Fakkeenyaaf, kan TTS.ai keessatti argamuu danda'u, kan TTS.ai keessatti hin argamuu ta'u, kan TTS.ai keessatti hin argamuu ta'u, fi kan TTS.ai keessatti hin argamuu ta'u.
Kokoro
Free
Kokoro waa'ee 82 million parameetraa kitaab-to-speech moolaa kan ta'e kan akka fuula isaa. Haalli isaa kan ciccimaa'e, kan ta'ee, kan ta'ee fi kan ta'ee. Kokoro afaanoota hedduu kan akka Aadaa, Jaapan, Siiniifi Koree kan ta'e, kan ta'ee fi kan ta'ee. Kan ta'ee fi kan ta'ee - kan audio 100x ta'ee kan ta'e fi kan ta'ee fi kan ta'ee kan GPU.
Hexgrad
Apache 2.0
Fast
en, ja, zh, fr, it, pt, es, hi
1.5GB
Haata'u
Birrii
Piper
Free
Piper'n fuula-to-speech engine'n gara-galmee'e kan Rhasspy'n qopheessee fi VITS fi larynx architecture'n fayyadamu. Akkasumas, kan akka CPU'n kan hojjetame ta'ee, kan akka alaabaa'ee, awtomaatikii ho'aa, fi appilikeeshiinii TTS of-line'n barbaachisu ta'ee. Akkasumas, kan akka fuula-to-speech'n, akkasumas, akka Raspberry Pi 4'tti, kan akka fuula-to-speech'n, akkasumas, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech'n, akka fuula-to-speech
Rhasspy
MIT
Fast
en, de, fr, es, it, pt, nl, pl, ru, zh, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi, ca, cy, fa, lv, sl, lb, eu, id, ku, ml, sq, te, ur
0 (CPU only)
Haata'u
Birrii
VITS
Free
VITS (Variation Inference with adversarial learning for end-to-end Text-to-Speech) tarree TTS kan akka "a" fi "b" kan wal-qabatee tarree tokkotti kan itti fayyadamu ta'uu isaati.
Jaehyeon Kim et al.
MIT
Fast
en, de, es, fr, pt, nl, fi, hu, bg, ja, pl
1GB
Haata'u
Birrii
MeloTTS
Free
MeloTTS by MyShell.ai waa'ee TTS labiiraa'ee afaan-kaaniin-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'ee-ka'
MyShell.ai
MIT
Fast
en, es, fr, zh, ja, ko
0.5GB (GPU optional)
Haata'u
Birrii
Bark
Standard
Bark by Suno waa'ee fuula-to-audio-modelaa transformaatoraatti hundaa'e kan ta'e, afaan-kaaniin kan dubbatamu fi kan afaan-kaaniin kan dubbatamu, akkasumas, audiowwan biroo akka muuzikaa, fuula-duratti, fi sammuu-ga'ee. Kan ta'e, wal-ga'ii hin-dhabdeen akka ka'uu, haadha-duuraa fi oof-dhabduu. Bark fuula-duuraa 100 fi afaan-13+ nuuf gargaara.
Suno
MIT
Slow
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
5GB
Haata'u
2x
Bark Small
Standard
Bark Kibbaan waa'ee distillate versii moodeelee Bark kan ta'e kan wal-ga'ii audio tokko tokkof akka inference velocities fi memory requirements cimaan. Kan itti-qabatee Bark's capacity to generate speech with emotions, laughs, and multiple languages.
Suno
MIT
Medium
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
2GB
Haata'u
2x
CosyVoice 2
Standard
CosyVoice 2 by Alibaba's Tongyi Lab nuuf kennuuf dandeettii dhaadannoo nama waliin walqabatee kan qabu, kan nama yeroo dhabeef nuuf ta'e. Kunis akkasumaan akka quuntamsiisaa fuula-duraa fi nuuf gargaaru. Kunis akkasumaan akka nuuf gargaaru.
Alibaba (Tongyi Lab)
Apache 2.0
Medium
en, zh, ja, ko, fr, de, it, es
4GB
Ya
2x
Dia TTS
Standard
Dia by Nari Labs isa 1.6B parameetirra teeksta-to-waamichaa moolaa kan walgahii-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamichaa-waamicha
Nari Labs
Apache 2.0
Medium
en
4GB
Haata'u
2x
Parler TTS
Standard
Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.
Hugging Face
Apache 2.0
Medium
en
4GB
Haata'u
2x
Indic Parler TTS
Standard
Parler TTS by AI4Bharat akka afaanii Hindiitti, akka afaan Tamil, Bengali, Marathi, Gujarati, Kannada, Punjabi, Odia, Assamese, Hindi, Telugu, Malayalam fi Ingiliiffatti, akka Parler, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti, akka afaanii salphaatti,
AI4Bharat
Apache 2.0
Slow
ta, bn, mr, gu, kn, pa, or, as, hi, te, ml, en
8GB
Haata'u
2x
KhanomTan TTS
Standard
KhanomTan TTS waa'ee Afaan Thai gara-dhaabduu-tti-bu'u-moodelaa-bu'aa-dhaabduu-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti-ti
Wannaphong Phatthiyaphaibun
Apache 2.0
Fast
th
2GB
Haata'u
2x
IndexTTS-2
Standard
IndexTTS-2 sistimni gara-gaaffii-gaaffii-gaafatamaa kan ta'e kan wal-fakkaatu tarree-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-
Index Team
Bilibili Model License
Medium
en, zh
4GB
Ya
2x
Spark TTS
Standard
Spark TTS by SparkAudio isa modelaa kitaab-to-speech kan akka fuula-duratti-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-
SparkAudio
CC BY-NC-SA 4.0
Medium
en, zh
4GB
Ya
2x
GPT-SoVITS
Standard
GPT-SoVITS modelaa afaanii GPT-style waliin SoVITS (Singing Voice Inference via Translation and Synthesis) walitti qabaa akka saffisaan saffisaa-shoot-ka'ee. Akka sekondii 5 ofiif, saffisaa akka saffisaan saffisaa fi saffisaa haaraa uumuu ni dandeessa, saffisaan haala saffisaa kan hin taanee fi saffisaan saffisaa kan hin taanee. Saffisaan saffisaa saffisaa fi saffisaan saffisaa kan hin taanee fi saffisaan saffisaa fi saffisaan kan hin taanee fi saffisaan saffisaan kan hin taanee fi saffisaan saffisaan kan hin taanee fi saffisaan saffisaan kan hin taanee fi saffisaan saffisaan kan hin taanee fi saffisaan saffisaan kan hin taanee fi saffisaan saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffisaan kan hin taanee fi saffi
RVC-Boss
MIT
Slow
en, zh, ja, ko
6GB
Ya
2x
Orpheus
Standard
Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.
Canopy Labs
Llama 3.2 Community
Medium
en
4GB
Haata'u
2x
Chatterbox
Premium
Chatterbox by Resemble AI waa'ee fuula duraa ta'e, mo'ellaa kloonaa dhalootaan hin taanee ti. Akkasumas, dhalootaan tokko irraa dhaloota tokkotti akka walfakkaatu, timbraa malee, haalaa fi fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula fuula
Resemble AI
MIT
Medium
en
4GB
Ya
4x
Tortoise TTS
Premium
Tortoise TTS sistiimii gara-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-dhaamsa-
James Betker
Apache 2.0
Slow
en
8GB
Ya
4x
StyleTTS 2
Premium
StyleTTS 2 sintesis TTS sadarkaa namatti argata, kan itti fayyadamu diffuusii stylii fi barnoota walqabatee, kan fayyadamu moodeeloota afaanii marii guddaa. Kunis, kan moodeeloota kan tokko tokkotti, kan akka reekaman namaatti, kan dubbatamu, kan dhaga'amu, fi kan dhaga'amu ta'uu danda'a. StyleTTS 2 diffuusii-based style modeling fayyadama, kan akka haala walfakkaatu kan homaatu hin qabne.
Columbia University
MIT
Medium
en
4GB
Haata'u
4x
OpenVoice
Premium
OpenVoice by MyShell.ai'n akka fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula duraa fuula dura
MyShell.ai / MIT
MIT
Medium
en, zh, ja, ko, fr, es
4GB
Ya
4x
Qwen3 TTS
Standard
Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports two modes: preset voices with emotion control (9 speakers), and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.
Alibaba (Qwen)
Apache 2.0
Medium
en, zh, ja, ko, de, fr, ru, pt, es, it
7GB
Haata'u
2x
VieNeu-TTS-v2
Standard
VieNeu-TTS-v2 waa'ee 300M parameetiraa fi mo'ellaa TTS Viyetnamii-tamaa kan barreeffame sa'aa 10,000+ daataa afaanii. Kan deeggaru en-vi kodee-ba'uu, dhaloota 7 kan wal-qabatee, fi kloonaa dhaloota 3-5 sekondii ofirratti. Kan hojjetame hunda CPU irratti GGUF Q4 infereensii + ONNX dekoder - GPU hin barbaachisu, jijjirama ~7 sekondii keessatti.
Phạm Nguyễn Ngọc Bảo
Apache 2.0
Fast
vi, en
CPU
Ya
2x
Sesame CSM
Premium
Sesame CSM (Modeelii Haawwacha Haawwachaa) yoo ta'u, modeeyilii parameetira 1 biliyoona kan ta'e kan haawwacha haawwachaa uumuudhaaf. Kunis modeeyilii haalata haawwacha namaa keessaatti, yeroon yerootti, deebiin backchannel, deebiin abdii, fi daangeessuu haawwacha. CSM audio akka haawwacha namaa ta'e uumuu danda'a, ka'umsa haawwacha namaa osoo hin ta'in haawwacha sinteettikii.
Sesame
Apache 2.0
Slow
en
8GB
Haata'u
4x
Chatterbox Turbo
Standard
Chatterbox Turbo by Resemble AI waa'ee 350M parameetiraa akka Chatterboxtti, gara yeroo-dhaabbata 6x'tti akka hin deebine, yeroo-dhaabbata 200ms'tti akka hin deebine. Taggaa paralinguistic akka [laugh], [cough], fi [chuckle] keessatti akka ta'e gargaara. Perth watermarking irratti oodiyoowwan hundaa'e akka itti fufuu akka ta'e gargaara.
Resemble AI
MIT
Fast
en
2GB
Ya
2x
VoxCPM
Standard
VoxCPM 1.5 by OpenBMB isa mo'ellaa TTS Tokenizer-free kan hin qabne kan hojiirra oolchu bakka itti jirutti osoo hin ta'in Token-discrete. Kan 44.1kHz audio, 3-10 sekondii, fi kan itti fufuu akkanumatti. Kloon-cross-languages siif kennee akka fuula Afaan Ingiliizii akka fuula Afaan Siiniitti fayyadamtu fi akkanumatti.
OpenBMB
Apache 2.0
Fast
en, zh
4GB
Ya
2x
Kani TTS 2
Free
Kani-TTS-2 by NineNineSix isa 400M parameetira moolaa ultra-dhaba'aa kan ijaare kan LFM2 backbone AI Liqii kan qabu NVIDIA NanoCodec. Kan dirqee 3GB VRAM qofa fi ~10 sekondii dhaamsa ~2 sekondii keessatti kan A100 (RTF 0.2). Fudhachiinsi ummatoota kan jiru kan 'kani-tts-2-en` checkpoint-in-English-only fi hook-in-speaker-embedding-hook-in-voice-cloning-required-not-exposed-to-cloning-voice-cloning-use-Chatterbox / IndexTTS2 / F5-TTS, or Kokoro / MeloTTS for non-English.
NineNineSix
Apache 2.0
Fast
en
3GB
Haata'u
Birrii
OuteTTS
Free
OuteTTS moodeeloota afaanii baay'ee kan akka teeksta-to-waamichaa fi fakkii-to-waamichaa kan fooyyessuu yoo ta'u, fakkii-to-waamichaa fi fakkii-waamichaa kan akka lama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, fi braazira infereensii kan akka Transformers.js.
OuteAI
Apache 2.0
Slow
en
2GB
Haata'u
Birrii
VibeVoice
Standard
VibeVoice of Microsoft kan dhaggeeffatu dhaggeeffama dheere hanga 90 daqiiqaa'tti kan gargaaru dhaggeeffatoota 4'f, kan itti fayyadamu akka podcasts fi walga'ii. Wabi-dhaggeeffataa Realtime 0.5B kan argatu ~300ms latency akka walga'ii fayyadamu. Kan gargaaru dhaggeeffataa tag'oota akka walga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-ga'ii-
Microsoft
MIT
Fast
en, zh
4GB
Haata'u
2x
Pocket TTS
Free
Pocket TTS by Kyutai (creators of Moshi) isa 100M parameetira 100M parameetira-to-speech modelaa kan wal-qabatee kan wal-qabatee. Kan CPU irratti hojiirra oola, kloonaa dhageenyii zero-shot kan deeggara, fi dhageenyii sammuu-qabeenya kan uumuu. Sa'aawwan modelaa ciccimaa kan akka edge deployment fi reef-low environments.
Kyutai
MIT
Fast
en, fr
1GB
Ya
Birrii
Kitten TTS
Free
Kitten TTS by KittenML isa mo'ellaa kitaaba-to-waamichaa ol-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-dhabdee-
KittenML
Apache 2.0
Fast
en
0GB
Haata'u
Birrii
CosyVoice3
Standard
CosyVoice3 kan tarree haaraan kan tarree FunAudioLLM Alibaba's tiimii ti. Kan bi-streaming inference kan qabu ~150ms latency, kan itti gaafatamummaa-qabee/rakkoo/volume, fi kan akka kan dubbatu kan cimee akka 0-shot kloon. Kan fuula 9 plus 18 Chinese dialects. RL-tuned variant kan kennutti state-of-the-art prosody.
Alibaba (FunAudioLLM)
Apache 2.0
Fast
en, zh, ja, ko, de, es, fr, it, ru
4GB
Ya
2x
NAMAA Saudi TTS
Standard
NAMAA Saudi TTS is a Saudi Arabic fine-tune of Resemble AI's ChatterboxMultilingual. Trained by NAMAA Space on authentic Saudi-dialect speech, it produces natural Modern Standard Arabic and Saudi colloquial pronunciation that generic multilingual models cannot match. Inherits Chatterbox's zero-shot voice cloning and emotion control via reference audio prompts. The first open-weights Arabic TTS deployed on TTS.ai.
NAMAA Space
MIT
Medium
ar
6GB
Ya
2x
Darwin TTS
Standard
Darwin-TTS-1.7B-Cross by FINAL-Bench waa'ee qwen3-TTS-1.7B kan barreeffame yoo ta'e, 84 talker-FFN tensors (8.6%) akka α=3% keessatti wal-qabatee tensors Qwen3-1.7B-Base waliin wal-qabatee. Fuula kun kan wal-qabatee ta'e yoo ta'e, akka korea, anglish, jaapan, fi sinhaayi keessatti.
FINAL-Bench
Apache 2.0
Medium
en, ko, ja, zh
7GB
Ya
2x
MOSS-TTSD
Standard
MOSS-TTSD v1.0 kan OpenMOSS'n ta'e, 7B'n kan barreessuu-to-waamu'u'n kan walga'ii itti fufuu'n. Akka 5'tti kan waamu'u'n ta'ee, [S1]/[S2] tagg'ee, 3-10s'n kan waamu'u'n, fi akka 60'tti kan waamu'u'n afaanota 20'f. MOSS-TTSD'n kan adda ta'e - TTSD'n kan walga'ii podcast/audiobook/dubbing'e.
OpenMOSS
Apache 2.0
Medium
en, zh
12GB
Ya
2x
Ming-Omni TTS
Free
Ming-omni-tts-0.5B by inclusionAI isa moolaa dhageettinnaa omni-modal compact kan ijaare bakka bu’aa BailingMM kan wal-qabatee kan qabu yoo ta’u, dekoder-aawdii Patch-by-Patch-matching. 44.1kHz output (CD quality) kan kennuu, kloonaa dhageettinnaa zero-shot kan deeggaru kan ta’e 3+ sekondii, fi kan akka fuula / dialek / BGM kan ta’e kan JSON.
inclusionAI
Apache 2.0
Medium
en, zh
3GB
Ya
Birrii
MOSS-TTS Nano
Free
MOSS-TTS-Nano-100M waa'ee parameetiraa 100M-aafaa MOSS-TTS kan OpenMOSS's, kan delay-transformer architecture waliin walqabatee. Qabeenya fooyya'aa mo'eelaa 8B's kan ~80x mi'aa'aa fi VRAM per-request kan ciminaan, kan itti fayyadamuuf ta'e.
OpenMOSS
Apache 2.0
Fast
en, zh, de, es, fr, ja, it, ko, ru, ar, pt
2GB
Ya
Birrii
Kokoro
Birrii
Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.
Hexgrad
Apache 2.0
Fast
Piper
Birrii
Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.
Rhasspy
MIT
Fast
VITS
Birrii
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.
Jaehyeon Kim et al.
MIT
Fast
MeloTTS
Birrii
MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.
MyShell.ai
MIT
Fast
Kani TTS 2
Birrii
Kani-TTS-2 by NineNineSix is an ultra-lightweight 400M parameter model built on a Liquid AI LFM2 backbone with NVIDIA NanoCodec. It runs in just 3GB VRAM and produces ~10 seconds of speech in ~2 seconds on an A100 (RTF 0.2). The current public release ships an English-only `kani-tts-2-en` checkpoint and does not expose the speaker-embedding hook needed for voice cloning — use Chatterbox / IndexTTS2 / F5-TTS for cloning, or Kokoro / MeloTTS for non-English.
NineNineSix
Apache 2.0
Fast
OuteTTS
Birrii
OuteTTS extends large language models with text-to-speech capabilities while preserving the original architecture. It supports multiple backends including llama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, and even browser inference via Transformers.js. Features zero-shot voice cloning through speaker profiles saved as JSON.
OuteAI
Apache 2.0
Slow
Pocket TTS
Birrii
Pocket TTS by Kyutai (creators of Moshi) is a compact 100M parameter text-to-speech model that punches well above its weight. It runs efficiently on CPU, supports zero-shot voice cloning from a single audio sample, and produces natural-sounding speech. The small model size makes it ideal for edge deployment and low-resource environments.
Kyutai
MIT
Fast
Kitten TTS
Birrii
Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.
KittenML
Apache 2.0
Fast
Ming-Omni TTS
Birrii
Ming-omni-tts-0.5B by inclusionAI is a compact omni-modal speech model built on the BailingMM dense backbone with a Patch-by-Patch flow-matching audio decoder. Delivers 44.1kHz output (near CD quality), supports zero-shot voice cloning from a 3+ second reference, and includes built-in emotion / dialect / BGM control via JSON instructions. Excellent stability — 0.83% WER on Chinese benchmarks.
inclusionAI
Apache 2.0
Medium
MOSS-TTS Nano
Birrii
MOSS-TTS-Nano-100M is OpenMOSS's compact 100M-parameter variant of the MOSS-TTS family, sharing the delay-transformer architecture. Trades the 8B model's peak quality for ~80x smaller weights and dramatically lower per-request VRAM, making it suitable for free-tier and high-throughput deployments. Same 20-language reach.
OpenMOSS
Apache 2.0
Fast
Bark
Standartaa
Bark by Suno is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio like music, background noise, and sound effects. It can produce nonverbal communications like laughing, sighing, and crying. Bark supports over 100 speaker presets and 13+ languages.
Suno
MIT
Slow
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
Haata'u
Bark Small
Standartaa
Bark Small is a distilled version of the Bark model that trades some audio quality for significantly faster inference speeds and lower memory requirements. It retains Bark's ability to generate speech with emotions, laughter, and multiple languages.
Suno
MIT
Medium
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
Haata'u
CosyVoice 2
Standartaa
CosyVoice 2 by Alibaba's Tongyi Lab achieves human-comparable speech quality with extremely low latency, making it ideal for real-time applications. It uses a finite scalar quantization approach for streaming synthesis and supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control. It outperforms many commercial TTS systems in subjective evaluations.
Alibaba (Tongyi Lab)
Apache 2.0
Medium
en, zh, ja, ko, fr, de, it, es
Ya
Dia TTS
Standartaa
Dia by Nari Labs is a 1.6B parameter text-to-speech model designed specifically for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia is perfect for creating podcast-style content, audiobook dialogues, and interactive conversational AI.
Nari Labs
Apache 2.0
Medium
en
Haata'u
Parler TTS
Standartaa
Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.
Hugging Face
Apache 2.0
Medium
en
Haata'u
Indic Parler TTS
Standartaa
Indic Parler TTS by AI4Bharat extends the Parler architecture to Indian languages, generating natural speech in Tamil, Bengali, Marathi, Gujarati, Kannada, Punjabi, Odia, Assamese, Hindi, Telugu, Malayalam and English. Like Parler, you describe the voice you want in plain language and the model matches it — no preset voices required. Trained on AI4Bharat speech corpora for authentic pronunciation and prosody across the Indian subcontinent.
AI4Bharat
Apache 2.0
Slow
ta, bn, mr, gu, kn, pa, or, as, hi, te, ml, en
Haata'u
KhanomTan TTS
Standartaa
KhanomTan TTS is an open Thai text-to-speech model built on the YourTTS multilingual architecture. Trained on CC0 and permissively-licensed Thai corpora (TSync) alongside several other languages, it delivers natural Thai speech with multiple speaker voices. A clean, commercially-usable option for Thai — the language most open TTS models only cover under non-commercial licenses.
Wannaphong Phatthiyaphaibun
Apache 2.0
Fast
th
Haata'u
IndexTTS-2
Standartaa
IndexTTS-2 is an advanced text-to-speech system that excels at zero-shot voice synthesis with fine-grained emotion control. It can generate speech with specific emotional tones like happy, sad, angry, or fearful without requiring emotion-specific training data. The model uses emotion vectors to precisely control the emotional expression of generated speech.
Index Team
Bilibili Model License
Medium
en, zh
Ya
Spark TTS
Standartaa
Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.
SparkAudio
CC BY-NC-SA 4.0
Medium
en, zh
Ya
GPT-SoVITS
Standartaa
GPT-SoVITS combines GPT-style language modeling with SoVITS (Singing Voice Inference via Translation and Synthesis) for powerful few-shot voice cloning. With as little as 5 seconds of reference audio, it can accurately clone a voice and generate new speech while preserving the speaker's unique characteristics. It excels at both speaking and singing voice synthesis.
RVC-Boss
MIT
Slow
en, zh, ja, ko
Ya
Orpheus
Standartaa
Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.
Canopy Labs
Llama 3.2 Community
Medium
en
Haata'u
Qwen3 TTS
Standartaa
Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports two modes: preset voices with emotion control (9 speakers), and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.
Alibaba (Qwen)
Apache 2.0
Medium
en, zh, ja, ko, de, fr, ru, pt, es, it
Haata'u
VieNeu-TTS-v2
Standartaa
VieNeu-TTS-v2 is a 300M parameter Vietnamese-first TTS model trained on 10,000+ hours of bilingual data. It supports seamless en-vi code-switching, 7 preset voices spanning Northern and Southern accents, and instant voice cloning from 3-5 seconds of reference audio. Runs entirely on CPU via GGUF Q4 inference + ONNX audio decoder — no GPU needed, generations finish in ~7 seconds. Built on a Qwen3 backbone.
Phạm Nguyễn Ngọc Bảo
Apache 2.0
Fast
vi, en
Ya
Chatterbox Turbo
Standartaa
Chatterbox Turbo by Resemble AI is a 350M parameter upgrade to Chatterbox, delivering up to 6x real-time speed with sub-200ms latency. It supports paralinguistic tags like [laugh], [cough], and [chuckle] directly in text. Includes Perth watermarking on all generated audio for provenance tracking.
Resemble AI
MIT
Fast
en
Ya
VoxCPM
Standartaa
VoxCPM 1.5 by OpenBMB is a novel tokenizer-free TTS model that operates in continuous space rather than discrete tokens. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from 3-10 seconds, and maintains consistency across paragraphs. Cross-language cloning lets you apply an English voice to Chinese speech and vice versa.
OpenBMB
Apache 2.0
Fast
en, zh
Ya
VibeVoice
Standartaa
VibeVoice from Microsoft generates long-form speech up to 90 minutes with support for 4 simultaneous speakers, making it ideal for podcasts and dialogues. The Realtime 0.5B variant achieves ~300ms latency for interactive use. Supports speaker tags for multi-turn dialogue generation.
Microsoft
MIT
Fast
en, zh
Haata'u
CosyVoice3
Standartaa
CosyVoice3 is the latest evolution from Alibaba's FunAudioLLM team. It features bi-streaming inference with ~150ms latency, instruction-based control for emotion/speed/volume, and improved speaker similarity for zero-shot cloning. Supports 9 languages plus 18 Chinese dialects. RL-tuned variant delivers state-of-the-art prosody.
Alibaba (FunAudioLLM)
Apache 2.0
Fast
en, zh, ja, ko, de, es, fr, it, ru
Ya
NAMAA Saudi TTS
Standartaa
NAMAA Saudi TTS is a Saudi Arabic fine-tune of Resemble AI's ChatterboxMultilingual. Trained by NAMAA Space on authentic Saudi-dialect speech, it produces natural Modern Standard Arabic and Saudi colloquial pronunciation that generic multilingual models cannot match. Inherits Chatterbox's zero-shot voice cloning and emotion control via reference audio prompts. The first open-weights Arabic TTS deployed on TTS.ai.
NAMAA Space
MIT
Medium
ar
Ya
Darwin TTS
Standartaa
Darwin-TTS-1.7B-Cross by FINAL-Bench is a research variant of Qwen3-TTS-1.7B where 84 talker-FFN tensors (8.6%) are blended at α=3% with the matching tensors from Qwen3-1.7B-Base. The blend is built without retraining and produces noticeably crisper cross-lingual voice cloning across Korean, English, Japanese, and Chinese. Operates in zero-shot voice-clone mode (3 seconds reference audio).
FINAL-Bench
Apache 2.0
Medium
en, ko, ja, zh
Ya
MOSS-TTSD
Standartaa
MOSS-TTSD v1.0 from OpenMOSS is a 7B dialogue text-to-speech model that continues conversations from a short audio prompt. Supports up to 5 simultaneous speakers via [S1]/[S2] tags, zero-shot voice cloning from 3-10s reference audio, and up to 60 minutes of coherent multi-turn dialogue across 20 languages. Distinct from MOSS-TTS — TTSD is specialized for podcast/audiobook/dubbing workflows.
OpenMOSS
Apache 2.0
Medium
en, zh
Ya
Tarree walqabatee
| Modelii | Deebi'aa: | Daandiin | Qiyaasa: | Jijjiiramni | Afaan Oromoo | Dhaabbilee | VRAM | Lizenz: | Qindaa'ina | |
|---|---|---|---|---|---|---|---|---|---|---|
| Kokoro | Hexgrad | Free | Fast | 8 | 1.5GB | Apache 2.0 | Birrii | Fuula | ||
| Piper | Rhasspy | Free | Fast | 42 | 0 (CPU only) | MIT | Birrii | Fuula | ||
| VITS | Jaehyeon Kim et al. | Free | Fast | 11 | 1GB | MIT | Birrii | Fuula | ||
| MeloTTS | MyShell.ai | Free | Fast | 6 | 0.5GB (GPU optional) | MIT | Birrii | Fuula | ||
| Bark | Suno | Standard | Slow | 13 | 5GB | MIT | 2 | Fuula | ||
| Bark Small | Suno | Standard | Medium | 13 | 2GB | MIT | 2 | Fuula | ||
| CosyVoice 2 | Alibaba (Tongyi Lab) | Standard | Medium | 8 | 4GB | Apache 2.0 | 2 | Fuula | ||
| Dia TTS | Nari Labs | Standard | Medium | 1 | 4GB | Apache 2.0 | 2 | Fuula | ||
| Parler TTS | Hugging Face | Standard | Medium | 1 | 4GB | Apache 2.0 | 2 | Fuula | ||
| Indic Parler TTS | AI4Bharat | Standard | Slow | 12 | 8GB | Apache 2.0 | 2 | Fuula | ||
| KhanomTan TTS | Wannaphong Phatthiyaphaibun | Standard | Fast | 1 | 2GB | Apache 2.0 | 2 | Fuula | ||
| IndexTTS-2 | Index Team | Standard | Medium | 2 | 4GB | Bilibili Model License | 2 | Fuula | ||
| Spark TTS | SparkAudio | Standard | Medium | 2 | 4GB | CC BY-NC-SA 4.0 | 2 | Fuula | ||
| GPT-SoVITS | RVC-Boss | Standard | Slow | 4 | 6GB | MIT | 2 | Fuula | ||
| Orpheus | Canopy Labs | Standard | Medium | 1 | 4GB | Llama 3.2 Community | 2 | Fuula | ||
| Chatterbox | Resemble AI | Premium | Medium | 1 | 4GB | MIT | 4 | Fuula | ||
| Tortoise TTS | James Betker | Premium | Slow | 1 | 8GB | Apache 2.0 | 4 | Fuula | ||
| StyleTTS 2 | Columbia University | Premium | Medium | 1 | 4GB | MIT | 4 | Fuula | ||
| OpenVoice | MyShell.ai / MIT | Premium | Medium | 6 | 4GB | MIT | 4 | Fuula | ||
| Qwen3 TTS | Alibaba (Qwen) | Standard | Medium | 10 | 7GB | Apache 2.0 | 2 | Fuula | ||
| VieNeu-TTS-v2 | Phạm Nguyễn Ngọc Bảo | Standard | Fast | 2 | CPU | Apache 2.0 | 2 | Fuula | ||
| Sesame CSM | Sesame | Premium | Slow | 1 | 8GB | Apache 2.0 | 4 | Fuula | ||
| Chatterbox Turbo | Resemble AI | Standard | Fast | 1 | 2GB | MIT | 2 | Fuula | ||
| VoxCPM | OpenBMB | Standard | Fast | 2 | 4GB | Apache 2.0 | 2 | Fuula | ||
| Kani TTS 2 | NineNineSix | Free | Fast | 1 | 3GB | Apache 2.0 | Birrii | Fuula | ||
| OuteTTS | OuteAI | Free | Slow | 1 | 2GB | Apache 2.0 | Birrii | Fuula | ||
| VibeVoice | Microsoft | Standard | Fast | 2 | 4GB | MIT | 2 | Fuula | ||
| Pocket TTS | Kyutai | Free | Fast | 2 | 1GB | MIT | Birrii | Fuula | ||
| Kitten TTS | KittenML | Free | Fast | 1 | 0GB | Apache 2.0 | Birrii | Fuula | ||
| CosyVoice3 | Alibaba (FunAudioLLM) | Standard | Fast | 9 | 4GB | Apache 2.0 | 2 | Fuula | ||
| NAMAA Saudi TTS | NAMAA Space | Standard | Medium | 1 | 6GB | MIT | 2 | Fuula | ||
| Darwin TTS | FINAL-Bench | Standard | Medium | 4 | 7GB | Apache 2.0 | 2 | Fuula | ||
| MOSS-TTSD | OpenMOSS | Standard | Medium | 2 | 12GB | Apache 2.0 | 2 | Fuula | ||
| Ming-Omni TTS | inclusionAI | Free | Medium | 2 | 3GB | Apache 2.0 | Birrii | Fuula | ||
| MOSS-TTS Nano | OpenMOSS | Free | Fast | 11 | 2GB | Apache 2.0 | Birrii | Fuula |
Akkasumas, akkasumas, akkasumas
Maaliif TTS.ai filatan akka barreessuu fi barreessuu?
TTS.ai modeeloota teeksti-to-waamu-bara-baraa-dunyaa-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-qunnamtii-
Every model is open source under MIT, Apache 2.0, or similar permissive licenses, ensuring you have full commercial rights to use the generated audio in your projects. Whether you need fast, lightweight synthesis for real-time applications or premium studio-quality output for audiobooks and podcasts, TTS.ai has the right model for every use case.
Modelii hin qabne, Konta hin qabdu
Haaromsa yeroon itti eegalu mo'ellaa TTS sadii: Piper (ultra-rakkoo, ga'ee), VITS (sinteesisi neural-qunnamtii olaanaa), fi MeloTTS (golgachuu afaanoota hedduu). Akkasumas, kaardii credit hin qabu, ga'eewwanii hin qaban. Mo'ellaa sadii Afaan Ingiliizii fi afaanota biroo hedduu waliin galchuu danda'a.
Fuula-GPU
Modeeloota TTS hunda kan dirqisiifaman GPU'oota NVIDIA kan itti fayyadaman yoo ta'e, yeroo uumuu yeroon itti fufuu. Modeeloota hin-qabne, yeroo baay'ee, audio'n yeroo 2 keessatti uumuu. Modeeloota Standartii akka Kokoro, CosyVoice 2, fi Bark, yeroo 3-5 keessatti. Modeeloota Premium, akka Tortoise fi Chatterbox, yeroo 5-15 keessatti fayyadamu, kan hundaa'e gara dheeraa teekstaatti.
30+ Afaan Oromoo
Haalli kun akka afaanota 30 ta'an akka afaan Ingliizii, Espaani'el, Faraansaa, Jarman, Italii, Portuugal, Siini'ee, Jaapan, Koree, Araabii, Hindii, Ruush, fi kanneen biroon. Modeeloonni hedduun sintesis cross-languageii deeggara, kan jedhu akka afaanota tokko tokkotti haalli kun akka hin barbaachisu. CosyVoice 2 fi GPT-SoVITS akka haallii cross-languageiitti wal-qabsiisan.
API-n qophaa'e
TTS.ai keessatti fayyadamuu dandeessu OpenAI-n itti fayyadamu REST API. Akkasumas, 20+ moolaawwan hundaf. Python, JavaScript, cURL, fi Go SDKs. Sukkanneessaan daawwachuu yeroo dhabamuu. Baatii hojiirra oolchuu kan ta'e kan inni guddaan. Webhooks kan ta'e kan hin ta'in.
Su'aalota yeroo dheeraaf dhiyaatan
Maaliif nu barbaachisa? Dhugaa kee nu gargaara rakkoolee ittisaa.
Jijjiiramni fuula duraa
Jiru namoota miiliyoonaan lakkaa'aman TTS.ai fayyadamuun. Qabdu 15,000 karaaktara bilaa'ee akka akkountaatti. Mootiiwwan bilaa'ee akka hin jirre.