I-Free AI Umbhalo ukuya kuSpeech
33+ iimodeli zomthombo ovulekileyo, 273+ ii-voices, 33+ Iinkqubo zekhompyutha
Yonke into oyifunayo kwi Voice AI
Izixhobo ezingaphezu kwe-30 ezixhaswa ziimodyuli ze-AI ezivulekileyo
33+ Iimodeli zesandi ze-AI
Uluhlu olupheleleyo lweemodeli ze-TTS ezivulekileyo kwinkqubo enye
Kokoro Iinketho zelizwe
I-Kokoro yimodeli yombhalo-ukuthetha eneparameter ezili-82 ezili-million eyenza ungqubano oluhle ngaphezulu kweqela layo lobunzima. Nangona ubungakanani bayo buncinci, ivelisa ukuthetha okucacileyo nobucacileyo. I-Kokoro ixhasa ulwimi oluninzi oluquka isiNgesi, isiJaphani, isiTshayina, nesiKorea ngeendlela ezahlukeneyo zesandi ezicacileyo. Isebenza ngokukhawuleza kakhulu — ivelisa isandi esimalunga ne-100x ngokukhawuleza kunexesha elibonakalayo kwi-GPU.
Elungileyo ku: I-TTS esezingeni eliphezulu enexesha lokulibaziseka elincinci, iinkqubo zokudlulisa
Zama simahla
Piper Iinketho zelizwe
I-Piper yinjini elula yombhalo-ukuthetha ephuhliswe yi Rhasspy esebenzisa i VITS kunye ne-larynx architectures. Isebenza ngokupheleleyo kwi CPU, iyenza ibe yindawo efanelekileyo yezixhobo zesiphelo, ulawulo lwasekhaya, kunye neenkqubo ezifuna i-offline TTS. Ngeelizwi ezingaphezu kwe-100 ezisuka kwiilwimi ezingaphezu kwe-30, i-Piper inikezela ngokuthetha okuziva ngathi kuqhelekanga kwisantya sexesha elibonakalayo nakwi-Raspberry Pi 4.
Elungileyo ku: Imboniselo yabucala ekhawulezayo, ufikelelo, kunye neenkqubo ezifakelweyo
Zama simahla
VITS Iinketho zelizwe
VITS (I-Variation Inference ne-adversarial learning for end-to-end Text-to-Speech) yindlela efana ne-end-to-end TTS evelisa isandi esininzi esiqhelekileyo kunezikhokelo zenqanaba elinye. Isebenzisa i-variation inference ephuculweyo ngokuhamba okuqhelekileyo kunye nenkqubo yoqeqesho oluchaphazelayo, efumana ukuphuculwa okubalulekileyo kwindalo.
Elungileyo ku: Umbhalo-usuka-ku-ukuthetha osetyenziswa ngokubanzi nge-prosody eqhelekileyo
Zama simahla
MeloTTS Iinketho zelizwe
MeloTTS yi MyShell. ai yi TTS yelayibrari exhasa isiNgesi (iMelika, iBrithani, i-Indian, i-Australian), isiSpanyol, isiFrentshi, isiTshayina, isiJaphani, nesiKorea. Ikhawuleza kakhulu, iqhubekekisa umbhalo kwisantya esifutshane sexesha elibonakalayo kwi CPU kuphela. MeloTTS icwangciswe ukusetyenziswa kokwenza imveliso kwaye ixhasa zombini i CPU ne GPU inference.
Elungileyo ku: Iinkqubo zokuvelisa ezifuna i-TTS ekhawulezayo, eneelwimi ezininzi
Zama simahla
Kani TTS 2 Iinketho zelizwe
Kani- TTS- 2 ngu NineNineSix yimodeli yeparameter ye-400M encinci kakhulu eyenziwe kwi-Liquid AI LFM2 backbone ene-NVIDIA NanoCodec. Isebenza kwi-3GB VRAM kuphela kwaye ivelisa ~10 imizuzwana yokuthetha kwi ~2 imizuzwana kwi-A100 (RTF 0. 2). Ukhupho lwabucala lwangoku luya kuzisa i-English- kuphela `kani- tts- 2- en` indawo yokukhangela kwaye ayifumanisi i-speaker- embedding hook efunekayo ukucloning kwelizwi — sebenzisa i-Chatterbox / IndexTTS2 / F5- TTS ukucloning, okanye i-Kokoro / MeloTTS ye-non- English.
Elungileyo ku: Uhlobo lwesiNgesi esikhawulezayo kwi-VRAM ephantsi yehardware, iimboniselo ezikhawulezayo
Zama simahla
OuteTTS Iinketho zelizwe
OuteTTS iqhuba iimodeli ezinkulu zolwimi ngemisebenzi yokubhala-ukuze-uthethe ngelixa igcina uyilo oluphambili. Ixhasa ii-backends ezininzi kubandakanya i-lama.cpp (CPU/GPU), Ukutsala i-Face Transformers, ExLlamaV2, VLLM, naphi na ukuqonda kwebrowser nge-Transformers.js. Iimpawu zokuklona kwelizwi elingenanto-eyenziweyo ngeeprofayili zomthumeli ezigcinwe njenge-JSON.
Elungileyo ku: Unikezelo lwe-edge, i-TTS esekelwe kwi-browser, imigangatho ephantsi-yomthombo
Zama simahla
Pocket TTS Iinketho zelizwe
I Pocket TTS ngu Kyutai (abavelisi be Moshi) yimodeli yombhalo- ukuya- ku- kuthetha encinci eneparameter ye 100M eyenza ubunzima bayo. Isebenza kakuhle kwi CPU, ixhasa ukuklona kwesandi esingenanto ukusuka kwisampuli yesandi, kwaye ivelisa ulwimi oluzimeleyo. Ubungakanani bemodeli encinci yenza ukuba ibe yindawo efanelekileyo yokubekwa kwesiphelo kunye nemeko- bume ephantsi yecebo.
Elungileyo ku: Unikezelo olusezantsi, i CPU- kuphela iimeko- bume, ukuclona kwelizwi ngokukhawuleza
Zama simahla
Kitten TTS Iinketho zelizwe
I Kitten TTS yi KittenML imodeli yombhalo- ukuya- ku- kuthetha elula kakhulu eyenziwe kwi ONNX. Ngeempawu ezisuka kwi 15M ukuya kwi 80M (25- 80 MB kwi diski), inikezela ngesandi esiphezulu sesandi kwi CPU ngaphandle kokufuna i GPU. Iimpawu 8 ezifakwe ngaphakathi kwesandi, unikezelo lwesandi olucwangcisiweyo, kunye noshicilelo ngaphambili lombhalo ofakwe ngaphakathi kwamanani, iimarike zemali, kunye neeyunithi. Ilungele unikezelo lwesiphelo kunye nesicelo esiphantsi se latency.
Elungileyo ku: I-TTS ekhawulezayo elula, unikezelo lwesiphelo, iinkqubo eziphantsi kokungalindelekanga
Zama simahla
Ming-Omni TTS Iinketho zelizwe
Ming- omni- tts- 0. 5B yi inclusionAI imodeli yokuthetha elula ye- omni- modal eyenziwe kwi- BailingMM eqinileyo ye- backbone ene- Patch- by- Patch flow- matching audio decoder. Inikezela nge- 44. 1kHz output (ecaleni kwe CD quality), ixhasa ukuklona kwelizwi eli- zero- shot ukusuka kwi- 3 + yesibini ubhekiso, kwaye iquka ulawulo olungaphakathi lweemvakalelo / iilwimi / i- BGM nge JSON imiyalelo. Ukuzinzisa okulungileyo - 0. 83% WER kwiimpawu ze- Chinese benchmarks.
Elungileyo ku: Ukuthetha ngeelwimi ezimbini ngokunyanisekileyo, ukumela ilizwi elilawulwa ngumnqweno, imixholo yencwadi yesandi yaseTshayina
Zama simahla
MOSS-TTS Nano Iinketho zelizwe
MOSS-TTS-Nano-100M yi OpenMOSS's compact 100M-parameter variant ye MOSS-TTS family, isebenzisa uyilo lwexesha lokulibazisa-lokuguqula. Ithengisa umgangatho ophezulu wemodeli ye 8B ye ~80x ngaphantsi kwesisindo kunye nokunciphisa okuphezulu kwe-VRAM ngesicelo, yenza ukuba ilungele unikezelo olukhululekileyo kunye nonikezelo oluphezulu. Ilingana ne-20-language reach.
Elungileyo ku: I-TTS ekhululekileyo, ukwenziwa kwevolumu ephezulu, ukusetyenziswa okunxibelelanayo okuphantsi
Zama simahla
Bark Emiselweyo
Imodeli yombhalo-ukuya-kwisandi esekelwe kwi-transformer evelisa ukuthetha okunyanisekileyo, umculo, kunye neziphumo zesandi.
Umbhekisi phambili: Suno · Ilayisensi: MIT
Zama kwakhona
Bark Small Emiselweyo
Uguqulelo olusezantsi lwe Bark olunolwazi olukhawulezayo nokusetyenziswa okuphantsi kovimba wolwazi.
Umbhekisi phambili: Suno · Ilayisensi: MIT
Zama kwakhona
CosyVoice 2 Emiselweyo
I-Alibaba's scalable streaming TTS ene-human-parity naturalness kunye ne-zero-near latency.
Umbhekisi phambili: Alibaba (Tongyi Lab) · Ilayisensi: Apache 2.0
Zama kwakhona
Dia TTS Emiselweyo
Imodeli yokudala ingxoxo yomthumeli-omninzi eyenza ingxoxo eqhelekileyo phakathi kwamathumeli.
Umbhekisi phambili: Nari Labs · Ilayisensi: Apache 2.0
Zama kwakhona
Parler TTS Emiselweyo
Ichaza ilizwi ofuna ngayo kwilwimi oluqhelekileyo kwaye i-Parler ivelise ukuthetha okuhambelanayo.
Umbhekisi phambili: Hugging Face · Ilayisensi: Apache 2.0
Zama kwakhona
IndexTTS-2 Emiselweyo
I-TTS engapheliyo ene-fine-grained emotional control kunye nokubonisa okuphezulu.
Umbhekisi phambili: Index Team · Ilayisensi: Bilibili Model License
Zama kwakhona
Spark TTS Emiselweyo
Uklone lwelizwi le TTS ngeemvakalelo ezilawulwayo kunye nesitayile sokuthetha ngeempendulo.
Umbhekisi phambili: SparkAudio · Ilayisensi: CC BY-NC-SA 4.0
Zama kwakhona
GPT-SoVITS Emiselweyo
Ilizwi elincinci-eliqhutywa lokuklonya i-TTS ephindayo nayiphi na ilizwi ukusuka kwimizuzu emihlanu kuphela yesandi.
Umbhekisi phambili: RVC-Boss · Ilayisensi: MIT
Zama kwakhona
Orpheus Emiselweyo
Imodeli ye-TTS evakalelwa ngamandla enqanaba lomuntu eqeqeshwe kwi-100K yeeyure zedatha yokuthetha.
Umbhekisi phambili: Canopy Labs · Ilayisensi: Llama 3.2 Community
Zama kwakhona
Qwen3 TTS Emiselweyo
I-Alibaba's multilingual TTS eneelizwi ezimiselweyo kunye noyilo lwelizwi ukusuka kumbhalo.
Umbhekisi phambili: Alibaba (Qwen) · Ilayisensi: Apache 2.0
Zama kwakhona
VieNeu-TTS-v2 Emiselweyo
Name=IsiVietnamese + isiNgesi ikhowudi- yokutshintshela i TTS ngeelizwi ezimiselweyo ezisi-7 nelo lizwi elilinganayo. CPU kuphela, akukho GPU ifunekayo. Name
Umbhekisi phambili: Phạm Nguyễn Ngọc Bảo · Ilayisensi: Apache 2.0
Zama kwakhona
Chatterbox Turbo Emiselweyo
Ibhokisi yencoko yababini ekhawulezayo ene sub-200ms latency kunye nee-tags zeparalinguistic zoluvo, ukuphefumla, kunye nezinye izinto.
Umbhekisi phambili: Resemble AI · Ilayisensi: MIT
Zama kwakhona
VoxCPM Emiselweyo
I-Tokenizer-free TTS ivelisa i-44.1kHz yesandi ngemeko-bume eyaziyo iparagraph consistency.
Umbhekisi phambili: OpenBMB · Ilayisensi: Apache 2.0
Zama kwakhona
VibeVoice Emiselweyo
Imodeli ye-Microsoft yezinto eziqulethe i-multi-speaker ezifana nepodcasts kunye neencwadi zesandi.
Umbhekisi phambili: Microsoft · Ilayisensi: MIT
Zama kwakhona
CosyVoice3 Emiselweyo
Uhlobo olulandelayo lwe-TTS olunolwazi oluninzi olunokuhamba-hamba, ulawulo lweemvakalelo, kunye nokuklonywa kwelizwi elingekhoyo.
Umbhekisi phambili: Alibaba (FunAudioLLM) · Ilayisensi: Apache 2.0
Zama kwakhona
NAMAA Saudi TTS Emiselweyo
I-TTS yokuqala evulekileyo ye-Saudi-Arabic. I-Saudi yendawo engqongileyo enesandi se-Chatterbox-quality cloning.
Umbhekisi phambili: NAMAA Space · Ilayisensi: MIT
Zama kwakhona
Darwin TTS Emiselweyo
I-Qwen3-TTS efana ne-cross-modal ene-FFN weights edityaniswa kwi-Qwen3-1.7B imodeli ye-language ye-multilingual cloning ecacileyo.
Umbhekisi phambili: FINAL-Bench · Ilayisensi: Apache 2.0
Zama kwakhona
MOSS-TTSD Emiselweyo
Imodeli yoqhubekeko lwencoko yababini yomthumeli-ninzi — yenza i-podcast-style conversations nge-5 speakers kunye nemizuzu engama-60 yesandi esihambelanayo.
Umbhekisi phambili: OpenMOSS · Ilayisensi: Apache 2.0
Zama kwakhona
CosyVoice 2
I-Alibaba's scalable streaming TTS ene-human-parity naturalness kunye ne-zero-near latency.
Iilwimi: en, zh, ja, ko, fr, de, it, es
Ilizwi lika-Clone
IndexTTS-2
I-TTS engapheliyo ene-fine-grained emotional control kunye nokubonisa okuphezulu.
Iilwimi: en, zh
Ilizwi lika-Clone
Spark TTS
Uklone lwelizwi le TTS ngeemvakalelo ezilawulwayo kunye nesitayile sokuthetha ngeempendulo.
Iilwimi: en, zh
Ilizwi lika-Clone
GPT-SoVITS
Ilizwi elincinci-eliqhutywa lokuklonya i-TTS ephindayo nayiphi na ilizwi ukusuka kwimizuzu emihlanu kuphela yesandi.
Iilwimi: en, zh, ja, ko
Ilizwi lika-Clone
Chatterbox
Uhlobo olutsha lwesandi esingena-nto esifana nesandi esilawulwa ngumnqweno ovela kwiResemble AI.
Iilwimi: en
Ilizwi lika-Clone
Tortoise TTS
Umbhalo-ukuthetha-ngezwi oluninzi olujolise kwixabiso kunye noyilo oluya ezantsi ngokuzenzekelayo.
Iilwimi: en
Ilizwi lika-Clone
OpenVoice
Uklonelo lwesandi olukhawulezayo nolawulo oluthe kratya kwindlela, imvakalelo, nesiqhelo.
Iilwimi: en, zh, ja, ko, fr, es
Ilizwi lika-Clone
VieNeu-TTS-v2
Name=IsiVietnamese + isiNgesi ikhowudi- yokutshintshela i TTS ngeelizwi ezimiselweyo ezisi-7 nelo lizwi elilinganayo. CPU kuphela, akukho GPU ifunekayo. Name
Iilwimi: vi, en
Ilizwi lika-Clone
Chatterbox Turbo
Ibhokisi yencoko yababini ekhawulezayo ene sub-200ms latency kunye nee-tags zeparalinguistic zoluvo, ukuphefumla, kunye nezinye izinto.
Iilwimi: en
Ilizwi lika-Clone
VoxCPM
I-Tokenizer-free TTS ivelisa i-44.1kHz yesandi ngemeko-bume eyaziyo iparagraph consistency.
Iilwimi: en, zh
Ilizwi lika-Clone
OuteTTS
I-LLM-based TTS esebenza kwi-CPU, GPU, okanye kwi-browser nge-lama.cpp ne-Transformers.js.
Iilwimi: en
Ilizwi lika-Clone
Pocket TTS
Imodeli elula yeparamitha ye-100M ye-Kyutai enesandi esifana nesona esivela kwisikhokelo esifanayo.
Iilwimi: en, fr
Ilizwi lika-Clone
CosyVoice3
Uhlobo olulandelayo lwe-TTS olunolwazi oluninzi olunokuhamba-hamba, ulawulo lweemvakalelo, kunye nokuklonywa kwelizwi elingekhoyo.
Iilwimi: en, zh, ja, ko, de, es, fr, it, ru
Ilizwi lika-Clone
NAMAA Saudi TTS
I-TTS yokuqala evulekileyo ye-Saudi-Arabic. I-Saudi yendawo engqongileyo enesandi se-Chatterbox-quality cloning.
Iilwimi: ar
Ilizwi lika-Clone
Darwin TTS
I-Qwen3-TTS efana ne-cross-modal ene-FFN weights edityaniswa kwi-Qwen3-1.7B imodeli ye-language ye-multilingual cloning ecacileyo.
Iilwimi: en, ko, ja, zh
Ilizwi lika-Clone
MOSS-TTSD
Imodeli yoqhubekeko lwencoko yababini yomthumeli-ninzi — yenza i-podcast-style conversations nge-5 speakers kunye nemizuzu engama-60 yesandi esihambelanayo.
Iilwimi: en, zh
Ilizwi lika-Clone
Ming-Omni TTS
Imodeli yokuthetha elula ye-0.5B esebenzisa i-omni-modal evela kwi-inclusionAI enemveliso ethembekileyo ye-44.1kHz kunye nokuphinda-phinda kwelizwi elingekhoyo.
Iilwimi: en, zh
Ilizwi lika-Clone
MOSS-TTS Nano
I-100M MOSS-TTS encinci efana ne-100M MOSS-TTS - i-architecture efanayo, 80x encinci, i-free-tier latency.
Iilwimi: en, zh, de, es, fr, ja, it, ko, ru, ar, pt
Ilizwi lika-CloneUmbhekisi phambili-Okuqalayo API
I-REST API ehambelana ne-OpenAI. Incopho enye yesiphelo, iimodeli ezingaphezu kwe-22. Inkxaso yosasazo lwezicelo zexesha elibonakalayo.
- Ifomati ehambelana ne-OpenAI
- Unikezelo lwe-TTS lweenkqubo zexesha elibonakalayo
- Uqhubekeko lweqela lomsebenzi omkhulu
- Isaziso se Webhook
pip install ttsai
npm install @ttsainpm/ttsai
from tts_ai import TTSClient
client = TTSClient(api_key="sk-tts-xxx")
audio = client.generate(
text="Hello from TTS.ai!",
model="kokoro",
voice="af_bella",
)
client.save(audio, "output.mp3")
Ixabiso elilula, elicacileyo
Qala ngokukhululekileyo. Ubungakanani njengoko ukhula.
Ekhululekileyo
15,000 iimpawu + 5,000/imini
- 7 iimodeli ezikhululekileyo kubandakanya iKokoro
- 5,000 iimpawu ngenkqubo
- Ufikelelo lwe-API luquka
Isiqalisi
500,000 iimpawu/inyanga
- Zonke iimodeli ezingaphezu kwe-22
- 100,000 iimpawu ngenkqubo
- I-Voice Cloning
I-Pro
2,000 iikhredithi/inyanga
- Yonke into kwisiqalisi
- Ufikelelo lwe-API
- Ukuqhubekeka okuphambili
Imisebenzi
10,000 iikhredithi/inyanga
- Yonke into kwi-Pro
- I-Bulk API
- Ufolo oluphambili
Imibuzo ebuzwa rhoqo
Yintoni esinokuyilungisa? Ulwazi lwakho olufunyenweyo lunceda silungise iingxaki.
Qala Ukusebenzisa i-AI Voice Namhlanje
Dibanisa abavelisi, abaphuhlisi, kunye neenkampani usebenzisa i-TTS.ai