I-AI ekhululekile Umbhalo usuka kumazwi

31+ imodeli yomthombo ovulekile 231+ izizwi, 34+ Akuna akhawunti edingekayo.

8K+
abakhiqizi
32K+
izizukulwane
31+
Amamodeli we-AI
231+
imisindo
0/500 amaphawu · Sign up for 5,000 per generation → Ikhululekile
Uthanda i-TTS.ai? Ncoma abangane bakho!

Konke okudingayo ngezwi AI

Amathuluzi angama-30+ asebenza ngemodeli ye-AI evulekile

31+ Amamodeli omsindo we-AI

Uhlelo oluphelele kakhulu lwezimo ze-TTS ezivulekile ezikhona kwi-platform eyodwa

KokoroKokoro Free

I-Kokoro iyimodeli ye-text-to-speech eneparameter engu-82 million eyenza kahle ngaphezu kwe-weight class yayo. Nakuba incane kakhulu, ikhiqiza amagama acacile futhi acacile. I-Kokoro isekela izilimi eziningi kufaka phakathi isiNgisi, isiJaphani, isiTshayina, nesiKoreane ngezinhlobonhlobo zamazwi acacile. Isebenza ngokushesha kakhulu — ikhiqiza umsindo osheshayo cishe ngama-100x kunosikhathi sangempela kwi-GPU.

Okungcono kakhulu: Ikhwalithi ephezulu ye-TTS enesikhathi sokuphuma esincane, izisebenziso zokusakaza

Zama mahhala

PiperPiper Free

I-Piper iyinjini elula yokubhala-ukukhuluma ethuthukiswe yi-Rhasspy esebenzisa i-VITS ne-larynx architectures. Isebenza ngokuphelele ku-CPU, iyenza ibe ngcono kakhulu kumadivayisi e-edge, ukuphathwa kwekhaya, namathuluzi adinga i-TTS engenayo. Ngezwi elingaphezu kuka-100 lidlula ulwimi olungaphezu kuka-30, i-Piper inikeza ukukhuluma okubukekayo ngokuzenzakalela ngejubane lesikhathi sangempela ngisho ne-Raspberry Pi 4.

Okungcono kakhulu: Ukubukeka okukhawulelwe, ukufinyeleleka, kanye nezisebenziso ezifakwe ngaphakathi

Zama mahhala

VITSVITS Free

VITS (Izibalo ezishintshayo ezifunda ngokuphikisanayo ukuqala ukubhala-ukukhuluma-ukuphela-ku-kuphela) yindlela ye-TTS elinganayo ekugcineni-ku-kuphela ekhiqiza umsindo ozwakalayo ojwayelekile kunalezo ezingemuva-ezimbili. Isebenzisa izibalo ezishintshayo ezithuthukisiwe ngokuhamba okujwayelekile kanye nenqubo yokuqeqeshwa okuphikisanayo, ethola ukukhula okuphawulekayo ekungavamile.

Okungcono kakhulu: Umbhalo-ku-ukukhuluma okusetshenziswa kakhulu nge-prosody ejwayelekile

Zama mahhala

MeloTTSMeloTTS Free

MeloTTS ngu MyShell.ai yi-TTS library eminingi ye-languages exhasa isiNgisi (i-American, i-British, i-Indian, i-Australian), isiShayina, isiJalimane, isiKorean. Ishesha kakhulu, isebenza umbhalo ngejubane elifanayo nesikhathi sangempela kwi-CPU kuphela. MeloTTS isetshenziselwa ukusetshenziswa kokukhiqizwa futhi ixhasa i-CPU ne-GPU inference.

Okungcono kakhulu: Izisebenziso zokukhiqiza ezidinga i-TTS esheshayo, enezilimi eziningi

Zama mahhala

OuteTTSOuteTTS Free

OuteTTS ithuthukisa amamodeli elimi elikhulu ngemisebenzi yokubhala-ukukhuluma ngenkathi igcina isakhiwo sayo sakuqala. Ixhasa izizinda eziningi kufaka phakathi i-lama.cpp (CPU/GPU), i-Hugging Face Transformers, ExLlamaV2, VLLM, futhi ngisho ne-browser inference nge-Transformers.js. Iqukethe ukuklonywa kwezwi lokushaya-ngokwe-zero ngeprofayili yomsindo egcinwe njenge-JSON.

Okungcono kakhulu: Ukuthuthuka kwe-edge, i-TTS esekelwe kumsakazi, izimo eziphansi ze-source

Zama mahhala

Pocket TTSPocket TTS Free

I-Pocket TTS ngu-Kyutai (abakhiqizi be-Moshi) iyimodeli yokubhala-ku-ukukhuluma encane engu-100M eyenza ubunzima bayo bube bukhulu kakhulu. Isebenza kahle ku-CPU, ixhasa ukuklonywa kwezwi lokushaya-isibalo kusuka kusampula elilodwa lomsindo, futhi ikhiqiza ukukhuluma okuzwakalayo. Ubukhulu bemodeli encane yenza kube ngcono kakhulu ukubekwa kwe-edge kanye nesimo sendawo ephansi.

Okungcono kakhulu: Ukuthuthuka okuncane, i-CPU-only environments, ukuklona kwezwi okusheshayo

Zama mahhala

Kitten TTSKitten TTS Free

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

Okungcono kakhulu: Fast lightweight TTS, edge deployment, low-latency applications

Zama mahhala

BarkBark Standard

Imodeli yokubhala-kuya-kwesandi esekelwe ku-transformer ekhiqiza amagama acacile, umculo, kanye nemiphumela yomsindo.

Umthuthukisi: Suno · Ilayisense: MIT

Zama

Bark SmallBark Small Standard

Uhlobo oluncane lwe-Bark olunezincazelo ezisheshayo nokusetshenziswa okuphansi kwememori.

Umthuthukisi: Suno · Ilayisense: MIT

Zama

CosyVoice 2CosyVoice 2 Standard

I-Alibaba's scalable streaming TTS ne-human-parity naturalness ne-near-zero latency.

Umthuthukisi: Alibaba (Tongyi Lab) · Ilayisense: Apache 2.0

Zama

Dia TTSDia TTS Standard

Imodeli yokukhiqiza umsindo oningi owenza ukuxhumana okujwayelekile phakathi kwama-speakers.

Umthuthukisi: Nari Labs · Ilayisense: Apache 2.0

Zama

Parler TTSParler TTS Standard

Sichaza umsindo ofuna ngesilimi esijwayelekile futhi i-Parler ikhiqiza umsindo olinganayo.

Umthuthukisi: Hugging Face · Ilayisense: Apache 2.0

Zama

GLM-TTSGLM-TTS Standard

Ithola iphutha lophawu oluphansi phakathi kwemodeli ye-TTS yomthombo ovulekile.

Umthuthukisi: Zhipu AI · Ilayisense: GLM-4 License

Zama

IndexTTS-2IndexTTS-2 Standard

I-TTS engekho emthethweni ene-fine-grained emotional control ne-high expressionality.

Umthuthukisi: Index Team · Ilayisense: Bilibili Model License

Zama

Spark TTSSpark TTS Standard

Uhlu lwezwi lokuklonya i-TTS nge-emoji elawulwayo nesimo sokukhuluma nge-prompts.

Umthuthukisi: SparkAudio · Ilayisense: CC BY-NC-SA 4.0

Zama

GPT-SoVITSGPT-SoVITS Standard

Uhlu lwezwi lokuklonya TTS oluncane oluphindayo noma yiluphi ulwimi kusuka kumasekondi angama-5 kuphela wesandi.

Umthuthukisi: RVC-Boss · Ilayisense: MIT

Zama

OrpheusOrpheus Standard

Imodeli ye-TTS enamandla okuqonda esezingeni lomuntu eqeqeshiwe ngehora le-100K ledatha yokukhuluma.

Umthuthukisi: Canopy Labs · Ilayisense: Llama 3.2 Community

Zama

Qwen3 TTSQwen3 TTS Standard

I-Alibaba's multilingual TTS nezwi lokuklonya, izizwi ezisetshenzisiwe, kanye nobuciko bezwi kusuka kumbhalo.

Umthuthukisi: Alibaba (Qwen) · Ilayisense: Apache 2.0

Zama

Chatterbox TurboChatterbox Turbo Standard

Ibhokisi lokuxoxa elisheshayo ne-sub-200ms latency kanye namathegi e-paralinguistic alula, aphuzi, nezinye izinto.

Umthuthukisi: Resemble AI · Ilayisense: MIT

Zama

Dia 2Dia 2 Standard

Ukusakazwa-kuqala kwe-TTS yokuxoxa ngezingxoxo zesikhulumi esiningi kanye ne-paralinguistic cues.

Umthuthukisi: Nari Labs · Ilayisense: Apache 2.0

Zama

VoxCPMVoxCPM Standard

I-Tokenizer-free TTS ekhiqiza umsindo we-44.1kHz nge-context-aware paragraph consistency.

Umthuthukisi: OpenBMB · Ilayisense: Apache 2.0

Zama

TADATADA Standard

I-zero-hallucination TTS nge-text-acoustic dual alignment, ihamba ngokushesha kune-5x kune-LLM TTS elinganisekayo.

Umthuthukisi: Hume AI · Ilayisense: MIT

Zama

VibeVoiceVibeVoice Standard

Imodeli ye-Microsoft yezinhlayiyana ze-multi-speaker ezinde njenge-podcasts ne-audiobooks.

Umthuthukisi: Microsoft · Ilayisense: MIT

Zama

CosyVoice3CosyVoice3 Standard

Next-generation multilingual TTS with bi-streaming, emotion control, and zero-shot voice cloning.

Umthuthukisi: Alibaba (FunAudioLLM) · Ilayisense: Apache 2.0

Zama

ChatterboxChatterbox Premium

Uhlelo olusha lokuklonya umsindo olungenalutho olune-emotion control oluvela ku-Resemble AI.

Ubunjani:

Zama

Tortoise TTSTortoise TTS Premium

Umbhalo-ku-ukukhuluma okhuluma ngezilimi eziningi obhekene nekhwalithi ngesakhiwo esibuyela emuva.

Ubunjani:

Zama

StyleTTS 2StyleTTS 2 Premium

Uhlelo lokuhlela amagama ngokuya ngesimo sengqondo somuntu kanye noqeqesho oluphikisanayo.

Ubunjani:

Zama

OpenVoiceOpenVoice Premium

Ukuklonya umsindo ngokuzenzakalela ngokulawula okuqinile ngesitayela, inkanuko, nesimo.

Ubunjani:

Zama

Sesame CSMSesame CSM Premium

Imodeli yokukhuluma ekhuluma ngokuzimela ekhiqiza ukuxhumana okujwayelekile ngesikhathi esifanele kanye nemizwa.

Ubunjani:

Zama

MOSS-TTSMOSS-TTS Premium

Ultra-long 20-language TTS supporting up to 1 hour of continuous generation with phoneme-level control.

Ubunjani:

Zama

MegaTTS3MegaTTS3 Premium

ByteDance's sparse alignment TTS with adjustable intelligibility vs. speaker similarity.

Ubunjani:

Zama

CosyVoice 2CosyVoice 2

I-Alibaba's scalable streaming TTS ne-human-parity naturalness ne-near-zero latency.

Izilimi: en, zh, ja, ko, fr, de, it, es

Umsindo

GLM-TTSGLM-TTS

Ithola iphutha lophawu oluphansi phakathi kwemodeli ye-TTS yomthombo ovulekile.

Izilimi: en, zh

Umsindo

IndexTTS-2IndexTTS-2

I-TTS engekho emthethweni ene-fine-grained emotional control ne-high expressionality.

Izilimi: en, zh

Umsindo

Spark TTSSpark TTS

Uhlu lwezwi lokuklonya i-TTS nge-emoji elawulwayo nesimo sokukhuluma nge-prompts.

Izilimi: en, zh

Umsindo

GPT-SoVITSGPT-SoVITS

Uhlu lwezwi lokuklonya TTS oluncane oluphindayo noma yiluphi ulwimi kusuka kumasekondi angama-5 kuphela wesandi.

Izilimi: en, zh, ja, ko

Umsindo

ChatterboxChatterbox

Uhlelo olusha lokuklonya umsindo olungenalutho olune-emotion control oluvela ku-Resemble AI.

Izilimi: en

Umsindo

Tortoise TTSTortoise TTS

Umbhalo-ku-ukukhuluma okhuluma ngezilimi eziningi obhekene nekhwalithi ngesakhiwo esibuyela emuva.

Izilimi: en

Umsindo

OpenVoiceOpenVoice

Ukuklonya umsindo ngokuzenzakalela ngokulawula okuqinile ngesitayela, inkanuko, nesimo.

Izilimi: en, zh, ja, ko, fr, de, es, it

Umsindo

Qwen3 TTSQwen3 TTS

I-Alibaba's multilingual TTS nezwi lokuklonya, izizwi ezisetshenzisiwe, kanye nobuciko bezwi kusuka kumbhalo.

Izilimi: en, zh, ja, ko, de, fr, ru, pt, es, it

Umsindo

Chatterbox TurboChatterbox Turbo

Ibhokisi lokuxoxa elisheshayo ne-sub-200ms latency kanye namathegi e-paralinguistic alula, aphuzi, nezinye izinto.

Izilimi: en

Umsindo

VoxCPMVoxCPM

I-Tokenizer-free TTS ekhiqiza umsindo we-44.1kHz nge-context-aware paragraph consistency.

Izilimi: en, zh

Umsindo

OuteTTSOuteTTS

I-LLM-based TTS esebenza ku-CPU, GPU, noma isiphequluli nge-lama.cpp ne-Transformers.js.

Izilimi: en

Umsindo

Pocket TTSPocket TTS

Imodeli elula ye-100M parameter eyenziwe ngu-Kyutai ngezwi lokuklonya kusuka kusampula eyodwa.

Izilimi: en, fr

Umsindo

CosyVoice3CosyVoice3

Next-generation multilingual TTS with bi-streaming, emotion control, and zero-shot voice cloning.

Izilimi: en, zh, ja, ko, de, es, fr, it, ru

Umsindo

MOSS-TTSMOSS-TTS

Ultra-long 20-language TTS supporting up to 1 hour of continuous generation with phoneme-level control.

Izilimi: en, zh, de, es, fr, ja, it, hu, ko, ru, fa, ar, pl, pt, cs, da, sv, el, tr

Umsindo

MegaTTS3MegaTTS3

ByteDance's sparse alignment TTS with adjustable intelligibility vs. speaker similarity.

Izilimi: en, zh

Umsindo

Umthuthukisi-kuqala API

I-REST API ehambisana ne-OpenAI. Ingxenye eyodwa, amamodeli angama-22+ Ukusakazwa kwengxoxo yesikhathi sangempela.

  • Ifomethi ehambisana ne-OpenAI
  • Ukusakazwa kwe-TTS kwezinhlelo zokusebenza zesikhathi sangempela
  • Uhlelo lwe-batch lwemisebenzi enkulu
  • Ulwaziso lwe-Webhook
Bona amadokhumende we-API
pip install ttsai npm install @ttsainpm/ttsai
Python
from tts_ai import TTSClient

client = TTSClient(api_key="sk-tts-xxx")
audio = client.generate(
    text="Hello from TTS.ai!",
    model="kokoro",
    voice="af_bella",
)
client.save(audio, "output.mp3")

Intengo elula, ecacile

Qalisa ngokukhululekileyo. Ukukala njengoba ukhula.

Ikhululekile

$0

15,000 amaphawu

  • Kokoro, Piper, VITS, MeloTTS
  • Iphutha lophawu lwe-500
  • 3 gen/ihora (akukho akhawunti)
Ubhalise

Isiqalisi

$9/ihora

500,000 characters/month

  • Zonke imodeli ezingu-22+
  • 100,000 amaphawu ngehlobo ngalinye
  • Ukulungiswa kwezwi
Qala
Okuthandwa kakhulu

I-Pro

$29/ihora

2,000,000 characters/month

  • Konke ku-Starter
  • Ukungena kwe-API
  • Ukulungiswa kokuqala
Thola i-Pro

Ibhizinisi

$99/ihora

10,000,000 characters/month

  • Konke ku-Pro
  • I-bulk API
  • Ifolokhwe yesinqumo
Thola umsebenzi

Bona zonke izilungiselelo kufaka phakathi izilungiselelo zophawu →

Imibuzo ebuzwa kaningi

TTS.ai iyi-AI ebanzi kakhulu ye-platform yezwi, enikeza amamodeli we-text-to-speech angama-22+, ukuklonya kwezwi, ukuxoxa kwe-text, namathuluzi e-audio. Zonke imodeli zivulekile ngaphandle kokuvula umhlinzeki.

Yebo! TTS.ai inikeza umbhalo-ku-ukukhuluma mahhala nge-Kokoro, Piper, VITS, ne-MeloTTS models. Akukho akhawunti edingekayo. Bhala ukuze uthole ama-15,000 ama-characters mahhala futhi ufinyelele kuzo zonke imodeli. Ama-plans akhokhelwayo aqala ku- $ 9 / ngenyanga.

Ukukhawulela, sebenzisa iKokoro noma iPiper. Ukwenza kahle, sebenzisa iCosyVoice 2 noma iStyleTTS 2. Ukuklona umsindo, sebenzisa iChatterbox noma iGPT-SoVITS. Ukwenza ingxoxo, sebenzisa iDia TTS. Zama amamodeli amaningi ku mbhalo owodwa ukuwalinganisa.

Yebo. OpenAI-ihambisana REST API for TTS, STT, ukucloning umsindo, kanye nemishini umsindo. Itholakala ku Pro ($ 29 / mo) kanye Enterprise ($ 99 / mo) izilungiselelo. Bona izingxoxo ku tts.ai / api /.

Ikhwalithi yomsindo ihluka ngokwemodeli. Amamodeli aphezulu njenge-CosyVoice 2, StyleTTS 2, ne-Chatterbox akhiqiza umsindo osezingeni elifanayo nomuntu nge-intonation ne-emotions ezijwayelekile. Amamodeli amahhala njenge-Kokoro anikeza ikhwalithi engcono kakhulu yezimo eziningi zokusetshenziswa.

TTS.ai isekela izilimi ezingaphezu kuka-30 ezisuka ku-model library yayo. IsiNgisi sisekela imodeli ebanzi kakhulu, kodwa amamodeli afana neCosyVoice 2 ahlanganisa isiNgisi, isiJalimane, nesiKorean; iGPT-SoVITS iphatha isiNgisi, isiJalimane, isiKorean, nesiNgisi; futhi iMeloTTS isekela isiNgisi, isiSpanishi, isiFrentshi, isiNgisi, isiJalimane, nesiKorean.

Yebo. Zonke izisebenziso zikhona kumaseva ethu akhethekile we-GPU. Asigcinanga umbhalo wakho ongeniswe noma umsindo okhiqizwe ngemuva kokuthunyelwa. Izinhlamvu zomsindo ezilayishwe phezulu zokuklonya zisetshenziswa kuphela kwisiqephu samanje futhi azigcinwanga. Asikwazi ukuhlukanisa idatha yakho namanye amaqembu noma ukusebenzisa imodeli yokuqeqesha.

Yebo. Zonke izisindo ezikhiqizwa ku-TTS.ai zikhona kuwe ukuze uzisebenzise ngokuhweba, kufaka phakathi i-YouTube video, i-podcast, ama-audiobooks, ama-apps, izikhangiso, nama-products. Amamodeli ethu avulekile ngaphansi kwelayisense elivumelayo (MIT, Apache 2.0). Akukho lungelo noma ukuphawula okudingekayo.

TTS.ai ikhiqiza umsindo ngefomethi ye-WAV ngokuzenzakalela ukuze kube nekhwalithi ephezulu. Ungaguqula ube yi-MP3, FLAC, OGG, noma M4A usebenzisa ithuluzi lethu elimahhala le-Audio Converter. I-API isekela ukucacisa ifomethi yakho ekhethiwe ye-output ngqo kwisicelo.

Layisha phezulu isampula lesandi esincane (esincane njengemizuzu emihlanu) sezwi ofuna ukuliklonela, bese ubhala noma iyiphi incwadi ukudala ukukhuluma kulolu zwi. Amamodeli afana ne-Chatterbox, GPT-SoVITS, ne-CosyVoice 2 axhasa ukuklonela kwezwi. Uzwi oluklonelwe luthatha into, umbala, nesimo sokukhuluma.

Amamodeli amahhala (iKokoro, iPiper, iVITS, iMeloTTS) adinga i-akhawunti futhi abiza ama-characters ayi-zero. Amamodeli ajwayelekile (ama-characters angama-2,000/1K) afaka iBark, iCosyVoice 2, iF5-TTS, neDia. Amamodeli aphezulu (ama-characters angama-4,000/1K) afaka iOpenVoice, iChatterbox, iStyleTTS 2, neTortoise. Amamodeli akhokhelwayo ngokuvamile anikeza umgangatho ophezulu, amazwi amaningi, kanye nezici ezingeziwe ezifana nokuklonyelwe kwezwi.

Yebo. I-API ixhasa ukucutshungulwa kwe-batch ukuguqula amavolumu amakhulu we-text ku-speech. Sebenzisa izicelo eziningi bese uthola izimpendulo nge-asynchronously usebenzisa i-job UUIDs. Izinhlelo ze-Enterprise ($99/mo) zifaka phakathi ukufinyelela kwe-queue yokuqala ukucubungula okusheshayo kwe-batch. Ilungele ukukhiqizwa kwe-audiobook, okuqukethwe kwe-course, namaphrojekthi amakhulu we-voiceover.
4.0/5 (22)

Yini esingayithuthukisa? Umbono wakho usiza ukuxazulula izinkinga.

Qala ukusebenzisa umsindo we-AI namhlanje

Xhumana nabakhiqizi, abathuthukisi, namabhizinisi asebenzisa i-TTS.ai