I-AI ekhululekile Umbhalo usuka kumazwi
33+ imodeli yomthombo ovulekile 273+ izizwi, 33+ Akuna akhawunti edingekayo.
Konke okudingayo ngezwi AI
Amathuluzi angama-30+ asebenza ngemodeli ye-AI evulekile
33+ Amamodeli omsindo we-AI
Uhlelo oluphelele kakhulu lwezimo ze-TTS ezivulekile ezikhona kwi-platform eyodwa
Kokoro Ikhululekile
I-Kokoro iyimodeli ye-text-to-speech eneparameter engu-82 million eyenza kahle ngaphezu kwe-weight class yayo. Nakuba incane kakhulu, ikhiqiza amagama acacile futhi acacile. I-Kokoro isekela izilimi eziningi kufaka phakathi isiNgisi, isiJaphani, isiTshayina, nesiKoreane ngezinhlobonhlobo zamazwi acacile. Isebenza ngokushesha kakhulu — ikhiqiza umsindo osheshayo cishe ngama-100x kunosikhathi sangempela kwi-GPU.
Okungcono kakhulu: Ikhwalithi ephezulu ye-TTS enesikhathi sokuphuma esincane, izisebenziso zokusakaza
Zama mahhala
Piper Ikhululekile
I-Piper iyinjini elula yokubhala-ukukhuluma ethuthukiswe yi-Rhasspy esebenzisa i-VITS ne-larynx architectures. Isebenza ngokuphelele ku-CPU, iyenza ibe ngcono kakhulu kumadivayisi e-edge, ukuphathwa kwekhaya, namathuluzi adinga i-TTS engenayo. Ngezwi elingaphezu kuka-100 lidlula ulwimi olungaphezu kuka-30, i-Piper inikeza ukukhuluma okubukekayo ngokuzenzakalela ngejubane lesikhathi sangempela ngisho ne-Raspberry Pi 4.
Okungcono kakhulu: Ukubukeka okukhawulelwe, ukufinyeleleka, kanye nezisebenziso ezifakwe ngaphakathi
Zama mahhala
VITS Ikhululekile
VITS (Izibalo ezishintshayo ezifunda ngokuphikisanayo ukuqala ukubhala-ukukhuluma-ukuphela-ku-kuphela) yindlela ye-TTS elinganayo ekugcineni-ku-kuphela ekhiqiza umsindo ozwakalayo ojwayelekile kunalezo ezingemuva-ezimbili. Isebenzisa izibalo ezishintshayo ezithuthukisiwe ngokuhamba okujwayelekile kanye nenqubo yokuqeqeshwa okuphikisanayo, ethola ukukhula okuphawulekayo ekungavamile.
Okungcono kakhulu: Umbhalo-ku-ukukhuluma okusetshenziswa kakhulu nge-prosody ejwayelekile
Zama mahhala
MeloTTS Ikhululekile
MeloTTS ngu MyShell.ai yi-TTS library eminingi ye-languages exhasa isiNgisi (i-American, i-British, i-Indian, i-Australian), isiShayina, isiJalimane, isiKorean. Ishesha kakhulu, isebenza umbhalo ngejubane elifanayo nesikhathi sangempela kwi-CPU kuphela. MeloTTS isetshenziselwa ukusetshenziswa kokukhiqizwa futhi ixhasa i-CPU ne-GPU inference.
Okungcono kakhulu: Izisebenziso zokukhiqiza ezidinga i-TTS esheshayo, enezilimi eziningi
Zama mahhala
Kani TTS 2 Ikhululekile
Kani-TTS-2 ngu NineNineSix yimodeli yeparameter ye-400M encane kakhulu eyenziwe nge-Liquid AI LFM2 backbone ne-NVIDIA NanoCodec. Isebenza nge-3GB VRAM kuphela futhi ikhiqiza amasekondi angama-10 wokukhuluma kumasekondi angama-2 ku-A100 (RTF 0.2). Ukukhishwa kwabasebenzi okwamanje kuletha i-English-only `kani-tts-2-en` checkpoint futhi akusho ukuveza i-speaker-embedding hook edingekayo ukuklonya umsindo — sebenzisa i-Chatterbox / IndexTTS2 / F5-TTS ukuklonya, noma i-Kokoro / MeloTTS enga-English.
Okungcono kakhulu: Ukukhiqizwa kwesiNgisi esikhawulelwe kwi-VRAM ephansi, ukubukeka okukhawulelwe
Zama mahhala
OuteTTS Ikhululekile
OuteTTS ithuthukisa amamodeli elimi elikhulu ngemisebenzi yokubhala-ukukhuluma ngenkathi igcina isakhiwo sayo sakuqala. Ixhasa izizinda eziningi kufaka phakathi i-lama.cpp (CPU/GPU), i-Hugging Face Transformers, ExLlamaV2, VLLM, futhi ngisho ne-browser inference nge-Transformers.js. Iqukethe ukuklonywa kwezwi lokushaya-ngokwe-zero ngeprofayili yomsindo egcinwe njenge-JSON.
Okungcono kakhulu: Ukuthuthuka kwe-edge, i-TTS esekelwe kumsakazi, izimo eziphansi ze-source
Zama mahhala
Pocket TTS Ikhululekile
I-Pocket TTS ngu-Kyutai (abakhiqizi be-Moshi) iyimodeli yokubhala-ku-ukukhuluma encane engu-100M eyenza ubunzima bayo bube bukhulu kakhulu. Isebenza kahle ku-CPU, ixhasa ukuklonywa kwezwi lokushaya-isibalo kusuka kusampula elilodwa lomsindo, futhi ikhiqiza ukukhuluma okuzwakalayo. Ubukhulu bemodeli encane yenza kube ngcono kakhulu ukubekwa kwe-edge kanye nesimo sendawo ephansi.
Okungcono kakhulu: Ukuthuthuka okuncane, i-CPU-only environments, ukuklona kwezwi okusheshayo
Zama mahhala
Kitten TTS Ikhululekile
I-Kitten TTS ngu-KittenML iyimodeli yokubhala-kuya-kwezwi elula kakhulu eyenziwe nge-ONNX. Ngezinhlobo ezisuka ku-15M kuya ku-80M (25-80 MB kwidiski), inikeza ukukhishwa kwezwi okusezingeni eliphezulu ku-CPU ngaphandle kokufuna i-GPU. Iqukethe izingxoxo ezingu-8 ezifakwe ngaphakathi, isivinini sokukhuluma esilungele, kanye nokufakwa kokuqala kokubhala kwamanombolo, ama-currency, namayunithi. Ilungele ukuthunyelwa kwe-edge kanye nezinhlelo eziphansi ze-latency.
Okungcono kakhulu: I-TTS elula futhi ekhawulelwe, ukubekwa kwengxenyekazi, izicelo eziphansi ze-latency
Zama mahhala
Ming-Omni TTS Ikhululekile
Ming-omni-tts-0.5B by inclusionAI imodeli yokukhuluma encane ye-omni-modal eyenziwe nge-BailingMM eqinile ebheke emuva ne-Patch-by-Patch ehambayo efana ne-audio decoder. Inikeza 44.1kHz output (ephakathi kwekhwalithi ye-CD), ixhasa ukuklonyeliswa kwezwi le-zero-shot kusuka ku-3 + isiyingi sokwethula, futhi ifaka i-built-in emotion / dialect / BGM ukulawula nge-JSON instructions. Excellent stability — 0.83% WER on Chinese benchmarks.
Okungcono kakhulu: Umlando okhuluma izilimi ezimbili osezingeni eliphakeme, ukulalela okulawulayo, okuqukethwe kwencwadi yevidiyo yase-Chinese
Zama mahhala
MOSS-TTS Nano Ikhululekile
MOSS-TTS-Nano-100M yi OpenMOSS's compact 100M-parameter variant ye-MOSS-TTS family, ehlanganisa i-delay-transformer architecture. Ithengisa i-8B model's peak quality for ~80x smaller weights and dramatically lower per-request VRAM, yenza ukuthi ilungele i-free-tier ne-high-throughput deployments. Same 20-language reach.
Okungcono kakhulu: I-TTS esezingeni elimahhala, ukukhishwa kwevolumu ephezulu, ukusetshenziswa okuhlobene nokwesaba okuphansi
Zama mahhala
Bark Iphutha
Imodeli yokubhala-kuya-kwesandi esekelwe ku-transformer ekhiqiza amagama acacile, umculo, kanye nemiphumela yomsindo.
Umthuthukisi: Suno · Ilayisense: MIT
Zama
Bark Small Iphutha
Uhlobo oluncane lwe-Bark olunezincazelo ezisheshayo nokusetshenziswa okuphansi kwememori.
Umthuthukisi: Suno · Ilayisense: MIT
Zama
CosyVoice 2 Iphutha
I-Alibaba's scalable streaming TTS ne-human-parity naturalness ne-near-zero latency.
Umthuthukisi: Alibaba (Tongyi Lab) · Ilayisense: Apache 2.0
Zama
Dia TTS Iphutha
Imodeli yokukhiqiza umsindo oningi owenza ukuxhumana okujwayelekile phakathi kwama-speakers.
Umthuthukisi: Nari Labs · Ilayisense: Apache 2.0
Zama
Parler TTS Iphutha
Sichaza umsindo ofuna ngesilimi esijwayelekile futhi i-Parler ikhiqiza umsindo olinganayo.
Umthuthukisi: Hugging Face · Ilayisense: Apache 2.0
Zama
IndexTTS-2 Iphutha
I-TTS engekho emthethweni ene-fine-grained emotional control ne-high expressionality.
Umthuthukisi: Index Team · Ilayisense: Bilibili Model License
Zama
Spark TTS Iphutha
Uhlu lwezwi lokuklonya i-TTS nge-emoji elawulwayo nesimo sokukhuluma nge-prompts.
Umthuthukisi: SparkAudio · Ilayisense: CC BY-NC-SA 4.0
Zama
GPT-SoVITS Iphutha
Uhlu lwezwi lokuklonya TTS oluncane oluphindayo noma yiluphi ulwimi kusuka kumasekondi angama-5 kuphela wesandi.
Umthuthukisi: RVC-Boss · Ilayisense: MIT
Zama
Orpheus Iphutha
Imodeli ye-TTS enamandla okuqonda esezingeni lomuntu eqeqeshiwe ngehora le-100K ledatha yokukhuluma.
Umthuthukisi: Canopy Labs · Ilayisense: Llama 3.2 Community
Zama
Qwen3 TTS Iphutha
I-Alibaba's multilingual TTS enezinhlamvu ezisetshenzisiwe kanye nobuciko bezinhlamvu kusuka kumbhalo.
Umthuthukisi: Alibaba (Qwen) · Ilayisense: Apache 2.0
Zama
VieNeu-TTS-v2 Iphutha
I-Vietnam + isiNgisi sokuguqulela ikhodi TTS ngemisindo emi-7 esethelwe ngaphambili ne-zero-shot voice cloning. CPU kuphela, akukho GPU edingekayo.
Umthuthukisi: Phạm Nguyễn Ngọc Bảo · Ilayisense: Apache 2.0
Zama
Chatterbox Turbo Iphutha
Ibhokisi lokuxoxa elisheshayo ne-sub-200ms latency kanye namathegi e-paralinguistic alula, aphuzi, nezinye izinto.
Umthuthukisi: Resemble AI · Ilayisense: MIT
Zama
VoxCPM Iphutha
I-Tokenizer-free TTS ekhiqiza umsindo we-44.1kHz nge-context-aware paragraph consistency.
Umthuthukisi: OpenBMB · Ilayisense: Apache 2.0
Zama
VibeVoice Iphutha
Imodeli ye-Microsoft yezinhlayiyana ze-multi-speaker ezinde njenge-podcasts ne-audiobooks.
Umthuthukisi: Microsoft · Ilayisense: MIT
Zama
CosyVoice3 Iphutha
Isigaba esilandelayo se-TTS esikhuluma izilimi eziningi esisebenzisa i-bi-streaming, ukulawula imizwa, kanye nokuklonya umsindo ongekho emthethweni.
Umthuthukisi: Alibaba (FunAudioLLM) · Ilayisense: Apache 2.0
Zama
NAMAA Saudi TTS Iphutha
I-TTS yokuqala evulekile yase-Saudi-Arabic. I-Saudi dialect ejwayelekile ne-Chatterbox-quality voice cloning.
Umthuthukisi: NAMAA Space · Ilayisense: MIT
Zama
Darwin TTS Iphutha
I-cross-modal Qwen3-TTS ehlukile ngesisindo se-FFN esixutshwe kusuka ku-Qwen3-1.7B imodeli yesilimi sokwenza okufanayo ngemithombo eminingi.
Umthuthukisi: FINAL-Bench · Ilayisense: Apache 2.0
Zama
MOSS-TTSD Iphutha
Imodeli yokuqhubeka kwengxoxo yezinhlamvu eziningi — yenza ingxoxo yohlobo lwepodcast ngezinhlamvu ezingu-5 namaminithi angama-60 esandi esihambisanayo.
Umthuthukisi: OpenMOSS · Ilayisense: Apache 2.0
Zama
CosyVoice 2
I-Alibaba's scalable streaming TTS ne-human-parity naturalness ne-near-zero latency.
Izilimi: en, zh, ja, ko, fr, de, it, es
Umsindo
IndexTTS-2
I-TTS engekho emthethweni ene-fine-grained emotional control ne-high expressionality.
Izilimi: en, zh
Umsindo
Spark TTS
Uhlu lwezwi lokuklonya i-TTS nge-emoji elawulwayo nesimo sokukhuluma nge-prompts.
Izilimi: en, zh
Umsindo
GPT-SoVITS
Uhlu lwezwi lokuklonya TTS oluncane oluphindayo noma yiluphi ulwimi kusuka kumasekondi angama-5 kuphela wesandi.
Izilimi: en, zh, ja, ko
Umsindo
Chatterbox
Uhlelo olusha lokuklonya umsindo olungenalutho olune-emotion control oluvela ku-Resemble AI.
Izilimi: en
Umsindo
Tortoise TTS
Umbhalo-ku-ukukhuluma okhuluma ngezilimi eziningi obhekene nekhwalithi ngesakhiwo esibuyela emuva.
Izilimi: en
Umsindo
OpenVoice
Ukuklonya umsindo ngokuzenzakalela ngokulawula okuqinile ngesitayela, inkanuko, nesimo.
Izilimi: en, zh, ja, ko, fr, es
Umsindo
VieNeu-TTS-v2
I-Vietnam + isiNgisi sokuguqulela ikhodi TTS ngemisindo emi-7 esethelwe ngaphambili ne-zero-shot voice cloning. CPU kuphela, akukho GPU edingekayo.
Izilimi: vi, en
Umsindo
Chatterbox Turbo
Ibhokisi lokuxoxa elisheshayo ne-sub-200ms latency kanye namathegi e-paralinguistic alula, aphuzi, nezinye izinto.
Izilimi: en
Umsindo
VoxCPM
I-Tokenizer-free TTS ekhiqiza umsindo we-44.1kHz nge-context-aware paragraph consistency.
Izilimi: en, zh
Umsindo
OuteTTS
I-LLM-based TTS esebenza ku-CPU, GPU, noma isiphequluli nge-lama.cpp ne-Transformers.js.
Izilimi: en
Umsindo
Pocket TTS
Imodeli elula ye-100M parameter eyenziwe ngu-Kyutai ngezwi lokuklonya kusuka kusampula eyodwa.
Izilimi: en, fr
Umsindo
CosyVoice3
Isigaba esilandelayo se-TTS esikhuluma izilimi eziningi esisebenzisa i-bi-streaming, ukulawula imizwa, kanye nokuklonya umsindo ongekho emthethweni.
Izilimi: en, zh, ja, ko, de, es, fr, it, ru
Umsindo
NAMAA Saudi TTS
I-TTS yokuqala evulekile yase-Saudi-Arabic. I-Saudi dialect ejwayelekile ne-Chatterbox-quality voice cloning.
Izilimi: ar
Umsindo
Darwin TTS
I-cross-modal Qwen3-TTS ehlukile ngesisindo se-FFN esixutshwe kusuka ku-Qwen3-1.7B imodeli yesilimi sokwenza okufanayo ngemithombo eminingi.
Izilimi: en, ko, ja, zh
Umsindo
MOSS-TTSD
Imodeli yokuqhubeka kwengxoxo yezinhlamvu eziningi — yenza ingxoxo yohlobo lwepodcast ngezinhlamvu ezingu-5 namaminithi angama-60 esandi esihambisanayo.
Izilimi: en, zh
Umsindo
Ming-Omni TTS
Imodeli yokukhuluma elula ye-0.5B omni-modal evela ku-inclusionAI ene-high-fidelity 44.1kHz output kanye noklonyeliswa kwezwi lokushaya-isibalo.
Izilimi: en, zh
Umsindo
MOSS-TTS Nano
Tiny 100M MOSS-TTS uhlobo — efanayo ukwakhiwa, 80x ncane, free-tier latency.
Izilimi: en, zh, de, es, fr, ja, it, ko, ru, ar, pt
UmsindoUmthuthukisi-kuqala API
I-REST API ehambisana ne-OpenAI. Ingxenye eyodwa, amamodeli angama-22+ Ukusakazwa kwengxoxo yesikhathi sangempela.
- Ifomethi ehambisana ne-OpenAI
- Ukusakazwa kwe-TTS kwezinhlelo zokusebenza zesikhathi sangempela
- Uhlelo lwe-batch lwemisebenzi enkulu
- Ulwaziso lwe-Webhook
pip install ttsai
npm install @ttsainpm/ttsai
from tts_ai import TTSClient
client = TTSClient(api_key="sk-tts-xxx")
audio = client.generate(
text="Hello from TTS.ai!",
model="kokoro",
voice="af_bella",
)
client.save(audio, "output.mp3")
Intengo elula, ecacile
Qalisa ngokukhululekileyo. Ukukala njengoba ukhula.
Ikhululekile
15,000 characters + 5,000/day
- Amamodeli amahhala angu-7 kufaka phakathi iKokoro
- 5,000 amaphawu ngehlobo ngalinye
- Ukungena kwe-API kufakwe
Isiqalisi
500,000 characters/month
- Zonke imodeli ezingu-22+
- 100,000 amaphawu ngehlobo ngalinye
- Ukulungiswa kwezwi
I-Pro
2,000,000 characters/month
- Konke ku-Starter
- Ukungena kwe-API
- Ukulungiswa kokuqala
Ibhizinisi
10,000,000 characters/month
- Konke ku-Pro
- I-bulk API
- Ifolokhwe yesinqumo
Bona zonke izilungiselelo kufaka phakathi izilungiselelo zophawu →
Imibuzo ebuzwa kaningi
Yini esingayithuthukisa? Umbono wakho usiza ukuxazulula izinkinga.
Qala ukusebenzisa umsindo we-AI namhlanje
Xhumana nabakhiqizi, abathuthukisi, namabhizinisi asebenzisa i-TTS.ai