Umbhalo we AI ukuya kuSpeechName

Guqula umbhalo ube ngumbhalo ovela kwisandi esiqhelekileyo nge open-source AI models. Ikhululekile ukuyisebenzisa, akukho akhawunti ifunekayo.

Asikho nasiphi na isandi se-TTS kwisiNgesi sakho. Nceda uncedo lwethu ukongeza isandi sakho! Intengiso yelizwi lakho
Bhalisa Uluhlu lwezinto zobumnini Zolwaleko...

Ulawulo oluchanekileyo:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Yongeza abaphawuli beminqweno ukukhuthaza ukuthunyelwa (uxhaso lwemodeli luhluka):

Chaza ubeko lwephepha

-12 +12
0.5x 2.0x
Ikhululekile nge Piper, VITS, MeloTTS
Isandi sakho esivelisweyo siza kuvela apha. Khetha imodeli, ngenisa umbhalo, kwaye unqakraze Yenza.
Isandi Sizaliswe Ngempumelelo
0:00 0:00
Layisha ezantsi Ikhonkco liphelelwe lixesha kwiyure ezi-24
Like TTS.ai? Tell your friends!

Iinkcukacha zemodeli

Kitten TTS

Kitten TTS

Free

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

Umbhekisi phambili: KittenML
Ilayisensi: Apache 2.0
Isantya Fast
Ubunjani:
Iilwimi 1 I-Language
VRAM 0GB
I-Voice Cloning Ayixhaswanga
Iimpawu:
CPU-only inference Under 80MB model size 8 built-in voices Speed control ONNX-based 24kHz output
Elungileyo ku:: Fast lightweight TTS, edge deployment, low-latency applications

Iingcebiso zokufumana iziphumo ezingcono

  • Sebenzisa iziphumlisi ezifanelekileyo zokuphumla okuqhelekileyo kunye nokuchaza amagama
  • Upelo lwamanani kunye neengcambu zokuthetha ngokucacileyo
  • Yongeza icommas ukwenza izithuba ezifutshane phakathi kwee frasa
  • Sebenzisa i-ellipsis (...) ukuphumla okude okunomtsalane
  • Zama i-Kokoro okanye i-CosyVoice 2 ukuze ufumane iziphumo eziqhelekileyo
  • Sebenzisa i-Dia yencoko yababini yomthumeli-omninzi kunye nemixholo yepodcast

Usebenziso Lophawu

I-Tier Ixabiso nge 1K uphawu
Ekhululekileyo 1:1 (i-free)
Emiselweyo 2x iimpawu
Ixabiso eliphezulu 4x iimpawu

Indlela i-AI isebenza ngayo kumbhalo ukuya kukuthetha

Yenza iingoma eziphezulu zesandi eziphezulu kwiintshukumo ezintathu ezilula. Akukho lwazi lwetekhnoloji lufunekayo.

Inyathelo 1

Ngenisa umbhalo wakho

Uhlobo, Cola, okanye Layisha phezulu umbhalo ofuna ukuwuguqula ube ngumbhalo. Inkxaso ukuya kwi 5, 000 iimpawu nge nkqubo nganye kubasebenzisi abangeneyo. Sebenzisa umbhalo ocacileyo okanye yongeza i SSML tags ulawulo oluphambili kwi ulwimi, izithuba, kunye nokuncamathelisa.

Inyathelo lesi-2

Khetha Imodeli & Ilizwi

Khetha phakathi kweemodeli ze-20+ ze-AI ezijikeleze amanqanaba amathathu. Khetha ilizwi elihambelana nomxholo wakho, khetha ulwimi oluthe ngqo, lungisa ukudlala ngokukhawuleza ukusuka kwi-0.5x ukuya kwi-2.0x, kwaye ukhethe ifomati yemveliso oyithandayo (MP3, WAV, OGG, okanye FLAC).

Inyathelo lesi-3

Layishela phantsi egronjiweyo

Nqakraza Yenza kwaye isandi sakho silungile kwimizuzu. Bona phambi koshicilelo nomdlali ofakwe ngaphakathi, khuphela kwifomati okhethiweyo, okanye kopela ikhonkco elinikezelweyo. Sebenzisa i API yokusebenza kweqela kunye nokudityaniswa kwindlela yakho yokusebenza.

Umbhalo ukuya kuMbhalo

Ukuguqula umbhalo ube ngumbhalo othethayo osebenza nge-AI uguqula indlela abantu abavelisa ngayo, besebenzisa ngayo, bethetha ngayo ngezinto eziqulethe isandi kwiindidi ezininzi zezithuthi.

Umbhalo osuka kwi-Speech Models

Iinkcukacha ezithe kratya zemodeli nganye ye AI efumaneka kwi TTS.ai. Thelekisa umgangatho, ukhawuleziso, inkxaso yeelwimi, kunye neempawu ukufumana imodeli egqibeleleyo yeprojekti yakho.

KokoroKokoro

Free

I-Kokoro yimodeli yombhalo-ukuthetha eneparameter ezili-82 ezili-million eyenza ungqubano oluhle ngaphezulu kweqela layo lobunzima. Nangona ubungakanani bayo buncinci, ivelisa ukuthetha okucacileyo nobucacileyo. I-Kokoro ixhasa ulwimi oluninzi oluquka isiNgesi, isiJaphani, isiTshayina, nesiKorea ngeendlela ezahlukeneyo zesandi ezicacileyo. Isebenza ngokukhawuleza kakhulu — ivelisa isandi esimalunga ne-100x ngokukhawuleza kunexesha elibonakalayo kwi-GPU.

Umbhekisi phambili::
Hexgrad
Ilayisensi::
Apache 2.0
Isantya:
Fast
Ubunjani::
Iilwimi:
en, ja, zh, ko, fr, de, it, pt, es, hi, ru
VRAM:
1.5GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
Ekhululekileyo
Iiparamitha ze-82M Ekhawulezayo kakhulu Ilizwi elithethayo Ulwimi oluninzi Inkxaso ye-Streaming
Elungileyo ku:: I-TTS esezingeni eliphezulu enexesha lokulibaziseka elincinci, iinkqubo zokudlulisa

PiperPiper

Free

I-Piper yinjini elula yombhalo-ukuthetha ephuhliswe yi Rhasspy esebenzisa i VITS kunye ne-larynx architectures. Isebenza ngokupheleleyo kwi CPU, iyenza ibe yindawo efanelekileyo yezixhobo zesiphelo, ulawulo lwasekhaya, kunye neenkqubo ezifuna i-offline TTS. Ngeelizwi ezingaphezu kwe-100 ezisuka kwiilwimi ezingaphezu kwe-30, i-Piper inikezela ngokuthetha okuziva ngathi kuqhelekanga kwisantya sexesha elibonakalayo nakwi-Raspberry Pi 4.

Umbhekisi phambili::
Rhasspy
Ilayisensi::
MIT
Isantya:
Fast
Ubunjani::
Iilwimi:
en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
VRAM:
0 (CPU only)
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
Ekhululekileyo
CPU- elungele Ukuphuma ngaphandle kwenethiwekhi kunokwenzeka 100+ izithethi 30+ Iilwimi Inkxaso ye SSML
Elungileyo ku:: Imboniselo yabucala ekhawulezayo, ufikelelo, kunye neenkqubo ezifakelweyo

VITSVITS

Free

VITS (I-Variation Inference ne-adversarial learning for end-to-end Text-to-Speech) yindlela efana ne-end-to-end TTS evelisa isandi esininzi esiqhelekileyo kunezikhokelo zenqanaba elinye. Isebenzisa i-variation inference ephuculweyo ngokuhamba okuqhelekileyo kunye nenkqubo yoqeqesho oluchaphazelayo, efumana ukuphuculwa okubalulekileyo kwindalo.

Umbhekisi phambili::
Jaehyeon Kim et al.
Ilayisensi::
MIT
Isantya:
Fast
Ubunjani::
Iilwimi:
en, zh, ja, ko
VRAM:
1GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
Ekhululekileyo
Ukwenziwa kwezinto ngesandla I-Prosody eNtsha Uvavanyo olukhawulezayo Abathethi abaninzi
Elungileyo ku:: Umbhalo-usuka-ku-ukuthetha osetyenziswa ngokubanzi nge-prosody eqhelekileyo

MeloTTSMeloTTS

Free

MeloTTS yi MyShell. ai yi TTS yelayibrari exhasa isiNgesi (iMelika, iBrithani, i-Indian, i-Australian), isiSpanyol, isiFrentshi, isiTshayina, isiJaphani, nesiKorea. Ikhawuleza kakhulu, iqhubekekisa umbhalo kwisantya esifutshane sexesha elibonakalayo kwi CPU kuphela. MeloTTS icwangciswe ukusetyenziswa kokwenza imveliso kwaye ixhasa zombini i CPU ne GPU inference.

Umbhekisi phambili::
MyShell.ai
Ilayisensi::
MIT
Isantya:
Fast
Ubunjani::
Iilwimi:
en, es, fr, zh, ja, ko
VRAM:
0.5GB (GPU optional)
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
Ekhululekileyo
I-CPU-ilungelelaniswe kakuhle Iilwimi ezininzi IsiNgesi-C Uluhlu lweeNkqubo Ixesha lokuphuma eliphantsi
Elungileyo ku:: Iinkqubo zokuvelisa ezifuna i-TTS ekhawulezayo, eneelwimi ezininzi

BarkBark

Standard

Bark ngu Suno yimodeli yombhalo- ukuya- kwisandi esekelwe kwi-transformer enokuthi ivelise ulwimi oluninzi olunobuntu obuphezulu, kunye nezinye iingoma ezinjengemiculo, ingxolo engaphakathi, kunye neziphumo zesandi. Iyakwazi ukudala unxibelelwano olungathethiweyo njengenkwenkwezi, ukuxoka, nokuxoka. Bark ixhasa ngaphezulu kweendawo ezimiselweyo zesandi ezili-100 kunye neelwimi ezili-13+.

Umbhekisi phambili::
Suno
Ilayisensi::
MIT
Isantya:
Slow
Ubunjani::
Iilwimi:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
5GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
2x
Iziphumo zesandi Uxolo/uxolo olukhulu Uhlobo lwengoma Abathethi abangaphezu kwe-100 Ulwimi oluninzi
Elungileyo ku:: Imixholo yesandi eyenziweyo, iincwadi zesandi ezineemvakalelo, iziphumo zesandi

Bark SmallBark Small

Standard

Bark Small yifomati eguqulwe kancinane yemodeli ye Bark ethengisa umgangatho wesandi ngesantya esikhawulezayo sokuzimisela kunye nemfuneko yobume obuphantsi bobume. Igcina ukhono lwe Bark lokuvelisa ulwimi oluneemvakalelo, uxolo, kunye neelwimi ezininzi.

Umbhekisi phambili::
Suno
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
2GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
2x
Iinkcukacha Ikhawuleza kune-Bark epheleleyo Ukuthetha ngokuzithandela Iilwimi ezininzi
Elungileyo ku:: Isandi esikhawulezayo esinobugcisa xa i-Bark epheleleyo ihamba kakubi kakhulu

CosyVoice 2CosyVoice 2

Standard

I-CosyVoice 2 yi-Alibaba' s Tongyi Lab ifumana umgangatho wokuthetha othelekiswa nomuntu nge latency ephantsi kakhulu, eyenza ukuba ibe yindawo efanelekileyo yesicelo sexesha elibonakalayo. Isebenzisa indlela yokwahlula i-quantization ye-scalar ephelelayo yokuhambisa uthungelwano kunye noxhasa ukuclonelwa kwelizwi elingekhoyo, uthungelwano lwesiNgesi, kunye nolawulo lweemvakalelo ezincinci. Isebenza kakuhle kunezinye iindlela zentengiso ze-TTS kwiziphumo zovavanyo.

Umbhekisi phambili::
Alibaba (Tongyi Lab)
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, ja, ko, fr, de, it, es
VRAM:
4GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
2x
Unikezelo Uklonelo lwe-zero-shot Iilwimi eziliqela Ulawulo lweemvakalelo I-Human-parity
Elungileyo ku:: Iinkqubo zexesha elibonakalayo, ukudlulisa i-TTS, abancedisi besandi

Dia TTSDia TTS

Standard

I-Dia yi-Nari Labs yi 1. 6B parameter yombhalo- ukuya- ku- ulwimi lwemodeli eyenziwe ngokukodwa ukudala unxibelelwano lomthumeli- omkhulu. Iyakwazi ukudala unxibelelwano olunombala phakathi kwamathumeli amabini ngokuthatha umjikelo ofanelekileyo, i-prosody, kunye nokubonisa iimvakalelo. I-Dia igqibelele ukudala imixholo yohlobo lwepodcast, unxibelelwano lweencwadi zesandi, kunye ne-AI ethetha- thetha.

Umbhekisi phambili::
Nari Labs
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en
VRAM:
4GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
2x
Umthumeli-woninzi Ukwenziwa kwencoko yababini Ukujika okuqhelekileyo Ukubonisa iimvakalelo Iiparamitha ze-1.6B
Elungileyo ku:: Ipodcasts, iincoko zencwadi enesandi, umxholo wencoko yababini

Parler TTSParler TTS

Standard

I-Parler TTS yimodeli yombhalo-ukuthetha esebenzisa ukuchazwa kwelizwi leelwimi eziqhelekileyo ukulawula ukuthetha okuveliswe. Ngelixa ukhetha ukusuka kwilizwi elimiselweyo, uchaza ilizwi ofuna (umzekelo, "ilizwi lentombazana eshushu enesivakalisi esincinci saseBrithani, ethetha ngokucothayo nocacileyo") kwaye i-Parler ivelisa ukuthetha ohambelana nolwazi. Oku kwenza ukuba ibe yeyona ilula kwisicelo esidala.

Umbhekisi phambili::
Hugging Face
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en
VRAM:
4GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
2x
Inkcazelo yeSandi Ulawulo lweelwimi zobuqu Ukwenza ilizwi elilula Akukho lizwi elimiselweyo elifunekayo
Elungileyo ku:: Iinkqubo ezinobuchule apho ufuna khona iimpawu zesandi ezizithandayo

GLM-TTSGLM-TTS

Standard

GLM- TTS ngu Zhipu AI yinkqubo yombhalo- ukuya- ku- kuthetha eyenziwe kwi Llama architecture ngothelekiso lokuhamba. Ifumana umyinge womonakalo wophawu olusezantsi phakathi kweemodeli ze- TTS ezivulekileyo, okuthetha ukuba ivelisa ukuthetha okuchanekileyo kakhulu. I- GLM- TTS ixhasa isiNgesi neSitshayina ngokuphindaphinda kwelizwi ukusuka kwi 3- 10 yesibini iiseti zesandi.

Umbhekisi phambili::
Zhipu AI
Ilayisensi::
GLM-4 License
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh
VRAM:
4GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
2x
Impazamo ephantsi Ukuphinda usebenzise ilizwi Uthelekiso lokuhamba I-Prosody eNtsha
Elungileyo ku:: Iinkqubo ezifuna ubukhulu bokungafani kokuthetha

IndexTTS-2IndexTTS-2

Standard

I-IndexTTS-2 yinkqubo ephambili yombhalo-ukuthetha eyenza kakuhle kwisandi esingena-nto esidityanisiweyo kunye nolawulo lweemvakalelo ezinogranule. Iyakwazi ukudala ukuthetha ngeetoni ezikhethekileyo zeemvakalelo ezinjengeemnandi, ezibuhlungu, ezixhaphakileyo, okanye ezixhalabisayo ngaphandle kokufuna i-data yoqeqesho lweemvakalelo ezikhethekileyo. Imodeli isebenzisa i-emotional vectors ukulawula ngokuchanekileyo ukubonisa kweemvakalelo zelizwi eliveliswe.

Umbhekisi phambili::
Index Team
Ilayisensi::
Bilibili Model License
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh
VRAM:
4GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
2x
Ulawulo lweemvakalelo I-Zero-shot Ii-Vectors zeMvakalelo Ukuthetha okuchazayo Ulawulo olunogranule encinci
Elungileyo ku:: Izinto eziqulethe ukubonisa iimvakalelo, iincwadi zesandi, abancedisi ababonakalayo

Spark TTSSpark TTS

Standard

I-Spark TTS ngu-SparkAudio yimodeli yombhalo-ukuthetha edibanisa ukuclonelwa kwelizwi ngeemvakalelo ezilawulwayo kunye nesitayile sokuthetha. Ukusebenzisa kuphela imizuzwana emihlanu yoluhlu lwesandi, inokuklona ilizwi kwaye emva koko ivelise ulwimi ngeemvakalelo ezahlukeneyo, isantya, kunye nesitayile ngelixa igcina uqhagamshelwano lwelizwi eliklonwe. I-Spark TTS isebenzisa inkqubo yolawulo olusekwe kwi-prompt.

Umbhekisi phambili::
SparkAudio
Ilayisensi::
CC BY-NC-SA 4.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh
VRAM:
4GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
2x
Ukuphinda usebenzise ilizwi Ulawulo lweemvakalelo Ulawulo lwesimbo I-Prompt-based 5- imizuzwana yokuklona
Elungileyo ku:: Ukwenza imixholo ngeelizwi eziklonyelweyo nolawulo lweemvakalelo

GPT-SoVITSGPT-SoVITS

Standard

GPT- SoVITS idibanisa i-GPT-style ulwimi lohlobo kunye ne SoVITS (Ukwahlula ngelizwi ngeNguqulelo kunye neSynthesis) ukufana kwelizwi elinamandla elincinci-lokubetha. Ngemizuzu emihlanu yobhekiso lwesandi, inokufana ngenene nelizwi kwaye ivelise ulwimi olutsha ngelixa igcina iimpawu ezikhethekileyo zomthunywa. Isebenza kakuhle kukuthetha kunye nokudibanisa ngelizwi.

Umbhekisi phambili::
RVC-Boss
Ilayisensi::
MIT
Isantya:
Slow
Ubunjani::
Iilwimi:
en, zh, ja, ko
VRAM:
6GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
2x
5- imizuzwana yokuklona Ilizwi elidlalayo Ukufunda ngemizuzwana embalwa Ukuthembeka okuphezulu Iilwimi eziliqela
Elungileyo ku:: Ukuphindaphinda kwelizwi, ukuphindaphinda kwelizwi lomenzi wemixholo

OrpheusOrpheus

Standard

I-Orpheus yimodeli enkulu yombhalo-ukuthetha-ukuthetha efumana ukubonakaliswa kweemvakalelo kwinqanaba lomntu. Iqeqeshwe ngaphezulu kweeyure ezili-100,000 ze data yokuthetha eyahlukeneyo, i excels ekuveliseni ukuthetha ngeemvakalelo eziqhelekileyo, uxinzelelo, kunye neendlela zokuthetha. I-Orpheus inokuvelisa ukuthetha okungaqhelekanga ukusuka kushicilelo lomntu.

Umbhekisi phambili::
Canopy Labs
Ilayisensi::
Llama 3.2 Community
Isantya:
Medium
Ubunjani::
Iilwimi:
en
VRAM:
4GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
2x
Umgangatho wengqondo yomuntu 100K iiyure zoqeqesho Ukubeka ingqalelo ngokwendalo Ukuthetha okuchazayo
Elungileyo ku:: Ukuthetha okunobuchule obuphezulu, iincwadi ezifundwayo, ukubhengeza ngelizwi

ChatterboxChatterbox

Premium

Ibhokisi yencoko yababini ngu Resemble AI yimodeli yokuklonya yelizwi eliphambili le-zero-shot. Iyakwazi ukubuyisela nayiphi na ilizwi ukusuka kwisampuli yesandi epheleleyo ngempumelelo ephawulekayo, ithatha hayi kuphela i-timbre kodwa nohlobo lokuthetha kunye nemibala eqaqambileyo. Ibhokisi yencoko yababini ikwabonisa ulawulo lweemvakalelo ezincinci, ezikuvumela ukuba ulungelelanise into eqaqambileyo yelizwi eliveliswe ngokuzimeleyo ukusuka kwilizwi elichaziweyo.

Umbhekisi phambili::
Resemble AI
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en
VRAM:
4GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
4x
Uklonelo lwe-zero-shot Ulawulo lweemvakalelo Ukuthembeka okuphezulu Unikezelo lwesimbo Uklonelo lwesampuli enye
Elungileyo ku:: Ukwenza ikopi yelizwi elisebenza kakuhle ngolawulo olunovakalelo, ukudala okuqulethwe

Tortoise TTSTortoise TTS

Premium

I-Tortoise TTS yinkqubo yokubhala- ukuya- ku- kuthetha enesandi esininzi esiphindayo esinika ingqalelo umgangatho wesandi ngaphezulu kwesantya. Isebenzisa uyilo lwe DALL- E- inspired ukudala ulwimi oluqhelekileyo kakhulu nge- prosody elungileyo kunye nohlobo lomvakalisi. Xa ihamba phantsi kunezinye iindlela ezininzi, i- Tortoise ivelisa ezinye zezinye zezinto ezibonakalayo zelizwi elifumanekayo kwindlela yokusebenza yomthombo ovulekileyo.

Umbhekisi phambili::
James Betker
Ilayisensi::
Apache 2.0
Isantya:
Slow
Ubunjani::
Iilwimi:
en
VRAM:
8GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
4x
Ubunjani obuphezulu kakhulu Ilizwi elininzi Uyilo lweDALL-E Ukuphinda usebenzise ilizwi Ukuphinda-phinda okuzenzekelayo
Elungileyo ku:: iincwadi zesandi, imixholo ephezulu, iinkqubo eziphezulu

StyleTTS 2StyleTTS 2

Premium

I-StyleTTS 2 ifumana uxinzelelo lwe-TTS lomgangatho womntu ngokudibanisa ukusasazeka kwesicwangciso kunye noqeqesho oluchaseneyo lusebenzisa iimodeli ezinkulu zesivakalisi. Ivelisa isivakalisi esidlangalaleni phakathi kweemodeli zomthumeli omnye, esinokhuphisana neengxelo zomntu. I-StyleTTS 2 isebenzisa ukusasazeka-okusekelwe kuyilo lwesivakalisi ukutsala uluhlu olupheleleyo lotshintsho lwesivakalisi somntu.

Umbhekisi phambili::
Columbia University
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en
VRAM:
4GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
4x
Umphakamo woMntu Uhlobo lokusasaza Uqeqesho oluchaphazela Utshintsho oluqhelekileyo Ukuthembeka okuphezulu
Elungileyo ku:: Umgangatho westudio-umgangatho wesandi esifanayo, ukuthetha okuzimeleyo

OpenVoiceOpenVoice

Premium

OpenVoice ngu MyShell. ai ivumela ukuklonya kwelizwi ngokuzenzekelayo ngolawulo olukhulu ngaphezulu kwendlela yelizwi, iimvakalelo, isivakalisi, umculo, izithuba, kunye ne-intonation. Iyakwazi ukuklonya ilizwi ukusuka kwiclip yesandi esifutshane kwaye ivelise ukuthetha kwiilwimi ezininzi ngelixa igcina uphawu lomthumeli. OpenVoice isebenza njengengcambu yelizwi, ivumela ukuguqulwa kwelizwi ngexesha elibonakalayo.

Umbhekisi phambili::
MyShell.ai / MIT
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, ja, ko, fr, de, es, it
VRAM:
4GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
4x
Uklonelo olukhawulezayo Uguqulelo lwesandi Ulawulo lweemvakalelo Ulawulo lwe-Accent Ulwimi oluninzi
Elungileyo ku:: Ukuphinda usebenzise ilizwi ngendlela yolawulo olunogranule encinci, uguqulelo lwelizwi

Qwen3 TTSQwen3 TTS

Standard

Qwen3- TTS yimodeli yombhalo- ukuya- ku- kuthetha yeparameter eyi- 1. 7 yezigidigidi ukusuka kwiqela le Qwen le Alibaba. Ixhasa iindlela ezintathu: ilizwi elimiselweyo elinomlawuli weemvakalelo (abathethi aba- 9), ukuclonelwa kwelizwi ukusuka kwimizuzu emi- 3 kuphela yesandi, kunye nendlela yoyilo lwelizwi elikhethekileyo apho uchaza khona ilizwi ofuna ngalo kwilwimi oluqhelekileyo. Iquka ulwimi oluli- 10 olunokubonisa okuphezulu kunye ne- prosody eqhelekileyo.

Umbhekisi phambili::
Alibaba (Qwen)
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, ja, ko, de, fr, ru, pt, es, it
VRAM:
7GB
I-Voice Cloning:
Ewe
Ixabiso nge 1K uphawu:
2x
Ukuphinda usebenzise ilizwi 9 ilizwi elichaziweyo phambi koshicilelo Uyilo lwesandi ukusuka kumbhalo Ulawulo lweemvakalelo Iilwimi
Elungileyo ku:: Imixholo yeelwimi ezininzi enesandi esifanayo okanye uyilo lwesandi oluzimeleyo

Sesame CSMSesame CSM

Premium

I-Sesame CSM (iModeli yoMbhalo weNtetho) yimodeli eneparameter ezizigidi ezili-1 ezidweliswe ngokukodwa ukuvelisa umbhalo wencoko. Imodeli imilinganiselo eqhelekileyo yencoko yomuntu kubandakanya ukujika-ukuthatha ixesha, uphendule umjelo, uphendule ngeemvakalelo, kunye nokuhamba kwencoko. I-CSM ivelisa umsindo oziva ngathi ngumbhalo wencoko yomuntu oqhelekileyo kunokuba ngumbhalo owenziwe ngesandla.

Umbhekisi phambili::
Sesame
Ilayisensi::
Apache 2.0
Isantya:
Slow
Ubunjani::
Iilwimi:
en
VRAM:
8GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
4x
Incoko Ixesha eliqhelekileyo Ukujika Isiqhagamshelanisi esezantsi Iiparamitha ze-1B
Elungileyo ku:: Ii-AI assistants, ii-chatbots, iinkqubo ze-AI ezithethayo

Kitten TTSKitten TTS

Free

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

Umbhekisi phambili::
KittenML
Ilayisensi::
Apache 2.0
Isantya:
Fast
Ubunjani::
Iilwimi:
en
VRAM:
0GB
I-Voice Cloning:
Akukho nanye
Ixabiso nge 1K uphawu:
Ekhululekileyo
CPU-only inference Under 80MB model size 8 built-in voices Speed control ONNX-based 24kHz output
Elungileyo ku:: Fast lightweight TTS, edge deployment, low-latency applications

KokoroKokoro

Ekhululekileyo

Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.

Umbhekisi phambili::
Hexgrad
Ilayisensi::
Apache 2.0
Isantya:
Fast
Ubunjani::
Iilwimi: en, ja, zh, ko, fr, de, it, pt, es, hi, ru
Elungileyo ku:: High-quality TTS with minimal latency, streaming applications

PiperPiper

Ekhululekileyo

Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.

Umbhekisi phambili::
Rhasspy
Ilayisensi::
MIT
Isantya:
Fast
Ubunjani::
Iilwimi: en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
Elungileyo ku:: Quick previews, accessibility, and embedded applications

VITSVITS

Ekhululekileyo

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.

Umbhekisi phambili::
Jaehyeon Kim et al.
Ilayisensi::
MIT
Isantya:
Fast
Ubunjani::
Iilwimi: en, zh, ja, ko
Elungileyo ku:: General-purpose text-to-speech with natural prosody

MeloTTSMeloTTS

Ekhululekileyo

MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.

Umbhekisi phambili::
MyShell.ai
Ilayisensi::
MIT
Isantya:
Fast
Ubunjani::
Iilwimi: en, es, fr, zh, ja, ko
Elungileyo ku:: Production applications needing fast, multilingual TTS

Kitten TTSKitten TTS

Ekhululekileyo

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

Umbhekisi phambili::
KittenML
Ilayisensi::
Apache 2.0
Isantya:
Fast
Ubunjani::
Iilwimi: en
Elungileyo ku:: Fast lightweight TTS, edge deployment, low-latency applications

BarkBark

Emiselweyo

Bark by Suno is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio like music, background noise, and sound effects. It can produce nonverbal communications like laughing, sighing, and crying. Bark supports over 100 speaker presets and 13+ languages.

Umbhekisi phambili::
Suno
Ilayisensi::
MIT
Isantya:
Slow
Ubunjani::
Iilwimi:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
I-Voice Cloning:
Akukho nanye
Sound effectsLaughing/sighingMusic generation100+ speakersMultilingual
Elungileyo ku:: Creative audio content, audiobooks with emotion, sound effects

Bark SmallBark Small

Emiselweyo

Bark Small is a distilled version of the Bark model that trades some audio quality for significantly faster inference speeds and lower memory requirements. It retains Bark's ability to generate speech with emotions, laughter, and multiple languages.

Umbhekisi phambili::
Suno
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
I-Voice Cloning:
Akukho nanye
LightweightFaster than full BarkEmotional speechMultilingual
Elungileyo ku:: Quick creative audio when full Bark is too slow

CosyVoice 2CosyVoice 2

Emiselweyo

CosyVoice 2 by Alibaba's Tongyi Lab achieves human-comparable speech quality with extremely low latency, making it ideal for real-time applications. It uses a finite scalar quantization approach for streaming synthesis and supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control. It outperforms many commercial TTS systems in subjective evaluations.

Umbhekisi phambili::
Alibaba (Tongyi Lab)
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, ja, ko, fr, de, it, es
I-Voice Cloning:
Ewe
StreamingZero-shot cloningCross-lingualEmotion controlHuman-parity
Elungileyo ku:: Real-time applications, streaming TTS, voice assistants

Dia TTSDia TTS

Emiselweyo

Dia by Nari Labs is a 1.6B parameter text-to-speech model designed specifically for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia is perfect for creating podcast-style content, audiobook dialogues, and interactive conversational AI.

Umbhekisi phambili::
Nari Labs
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en
I-Voice Cloning:
Akukho nanye
Multi-speakerDialog generationNatural turn-takingEmotional expression1.6B parameters
Elungileyo ku:: Podcasts, audiobook dialogues, conversational content

Parler TTSParler TTS

Emiselweyo

Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.

Umbhekisi phambili::
Hugging Face
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en
I-Voice Cloning:
Akukho nanye
Voice descriptionNatural language controlFlexible voice creationNo preset voices needed
Elungileyo ku:: Creative applications where you need custom voice characteristics

GLM-TTSGLM-TTS

Emiselweyo

GLM-TTS by Zhipu AI is a text-to-speech system built on the Llama architecture with flow matching. It achieves the lowest character error rate among open-source TTS models, meaning it produces the most accurate pronunciation. GLM-TTS supports English and Chinese with voice cloning from 3-10 second audio samples.

Umbhekisi phambili::
Zhipu AI
Ilayisensi::
GLM-4 License
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh
I-Voice Cloning:
Ewe
Lowest error rateVoice cloningFlow matchingNatural prosody
Elungileyo ku:: Applications requiring maximum pronunciation accuracy

IndexTTS-2IndexTTS-2

Emiselweyo

IndexTTS-2 is an advanced text-to-speech system that excels at zero-shot voice synthesis with fine-grained emotion control. It can generate speech with specific emotional tones like happy, sad, angry, or fearful without requiring emotion-specific training data. The model uses emotion vectors to precisely control the emotional expression of generated speech.

Umbhekisi phambili::
Index Team
Ilayisensi::
Bilibili Model License
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh
I-Voice Cloning:
Ewe
Emotion controlZero-shotEmotion vectorsExpressive speechFine-grained control
Elungileyo ku:: Emotionally expressive content, audiobooks, virtual assistants

Spark TTSSpark TTS

Emiselweyo

Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.

Umbhekisi phambili::
SparkAudio
Ilayisensi::
CC BY-NC-SA 4.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh
I-Voice Cloning:
Ewe
Voice cloningEmotion controlStyle controlPrompt-based5-second cloning
Elungileyo ku:: Content creation with cloned voices and emotional control

GPT-SoVITSGPT-SoVITS

Emiselweyo

GPT-SoVITS combines GPT-style language modeling with SoVITS (Singing Voice Inference via Translation and Synthesis) for powerful few-shot voice cloning. With as little as 5 seconds of reference audio, it can accurately clone a voice and generate new speech while preserving the speaker's unique characteristics. It excels at both speaking and singing voice synthesis.

Umbhekisi phambili::
RVC-Boss
Ilayisensi::
MIT
Isantya:
Slow
Ubunjani::
Iilwimi:
en, zh, ja, ko
I-Voice Cloning:
Ewe
5-second cloningSinging voiceFew-shot learningHigh fidelityCross-lingual
Elungileyo ku:: Voice cloning, singing synthesis, content creator voice replication

OrpheusOrpheus

Emiselweyo

Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.

Umbhekisi phambili::
Canopy Labs
Ilayisensi::
Llama 3.2 Community
Isantya:
Medium
Ubunjani::
Iilwimi:
en
I-Voice Cloning:
Akukho nanye
Human-level emotion100K hours trainingNatural emphasisExpressive speech
Elungileyo ku:: High-quality emotional speech, audiobooks, voice acting

Qwen3 TTSQwen3 TTS

Emiselweyo

Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports three modes: preset voices with emotion control (9 speakers), voice cloning from just 3 seconds of audio, and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.

Umbhekisi phambili::
Alibaba (Qwen)
Ilayisensi::
Apache 2.0
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, ja, ko, de, fr, ru, pt, es, it
I-Voice Cloning:
Ewe
Voice cloning9 preset voicesVoice design from textEmotion control10 languages
Elungileyo ku:: Multilingual content with voice cloning or custom voice design

ChatterboxChatterbox

Ixabiso eliphezulu

Chatterbox by Resemble AI is a cutting-edge zero-shot voice cloning model. It can replicate any voice from a single audio sample with remarkable accuracy, capturing not just the timbre but also the speaking style and emotional nuances. Chatterbox also features fine-grained emotion control, allowing you to adjust the emotional tone of the generated speech independently from the voice identity.

Umbhekisi phambili::
Resemble AI
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en
I-Voice Cloning:
Ewe
VRAM:
4GB
Ixabiso nge 1K uphawu:
4x
Zero-shot cloningEmotion controlHigh fidelityStyle transferSingle sample cloning
Elungileyo ku:: Professional voice cloning with emotional control, content creation

Tortoise TTSTortoise TTS

Ixabiso eliphezulu

Tortoise TTS is an autoregressive multi-voice text-to-speech system that prioritizes audio quality over speed. It uses DALL-E-inspired architecture to generate highly natural speech with excellent prosody and speaker similarity. While slower than many alternatives, Tortoise produces some of the most realistic synthetic speech available in the open-source ecosystem.

Umbhekisi phambili::
James Betker
Ilayisensi::
Apache 2.0
Isantya:
Slow
Ubunjani::
Iilwimi:
en
I-Voice Cloning:
Ewe
VRAM:
8GB
Ixabiso nge 1K uphawu:
4x
Highest qualityMulti-voiceDALL-E architectureVoice cloningAutoregressive
Elungileyo ku:: Audiobooks, premium content, quality-first applications

StyleTTS 2StyleTTS 2

Ixabiso eliphezulu

StyleTTS 2 achieves human-level TTS synthesis by combining style diffusion with adversarial training using large speech language models. It generates the most natural sounding speech among single-speaker models, rivaling human recordings. StyleTTS 2 uses diffusion-based style modeling to capture the full range of human speech variation.

Umbhekisi phambili::
Columbia University
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en
I-Voice Cloning:
Akukho nanye
VRAM:
4GB
Ixabiso nge 1K uphawu:
4x
Human-levelStyle diffusionAdversarial trainingNatural variationHigh fidelity
Elungileyo ku:: Studio-quality single-speaker synthesis, professional narration

OpenVoiceOpenVoice

Ixabiso eliphezulu

OpenVoice by MyShell.ai enables instant voice cloning with granular control over voice style, emotion, accent, rhythm, pauses, and intonation. It can clone a voice from a short audio clip and generate speech in multiple languages while maintaining the speaker identity. OpenVoice also functions as a voice converter, allowing real-time voice transformation.

Umbhekisi phambili::
MyShell.ai / MIT
Ilayisensi::
MIT
Isantya:
Medium
Ubunjani::
Iilwimi:
en, zh, ja, ko, fr, de, es, it
I-Voice Cloning:
Ewe
VRAM:
4GB
Ixabiso nge 1K uphawu:
4x
Instant cloningVoice conversionEmotion controlAccent controlMultilingual
Elungileyo ku:: Voice cloning with fine-grained style control, voice conversion

Sesame CSMSesame CSM

Ixabiso eliphezulu

Sesame CSM (Conversational Speech Model) is a 1 billion parameter model designed specifically for generating conversational speech. It models the natural patterns of human conversation including turn-taking timing, backchannel responses, emotional reactions, and conversational flow. CSM generates audio that sounds like a natural human conversation rather than synthetic speech.

Umbhekisi phambili::
Sesame
Ilayisensi::
Apache 2.0
Isantya:
Slow
Ubunjani::
Iilwimi:
en
I-Voice Cloning:
Akukho nanye
VRAM:
8GB
Ixabiso nge 1K uphawu:
4x
ConversationalNatural timingTurn-takingBackchannel1B parameters
Elungileyo ku:: AI assistants, chatbots, conversational AI applications

Imodeli Yothelekiso Lwetheyibhile

Imodeli Umbhekisi phambili: I-Tier Ubunjani: Isantya Iilwimi I-Voice Cloning VRAM Ilayisensi: Ixabiso
Kokoro Hexgrad Free Fast 11 1.5GB Apache 2.0 Ekhululekileyo Igama lefayile
Piper Rhasspy Free Fast 31 0 (CPU only) MIT Ekhululekileyo Igama lefayile
VITS Jaehyeon Kim et al. Free Fast 4 1GB MIT Ekhululekileyo Igama lefayile
MeloTTS MyShell.ai Free Fast 6 0.5GB (GPU optional) MIT Ekhululekileyo Igama lefayile
Bark Suno Standard Slow 13 5GB MIT 2 Igama lefayile
Bark Small Suno Standard Medium 13 2GB MIT 2 Igama lefayile
CosyVoice 2 Alibaba (Tongyi Lab) Standard Medium 8 4GB Apache 2.0 2 Igama lefayile
Dia TTS Nari Labs Standard Medium 1 4GB Apache 2.0 2 Igama lefayile
Parler TTS Hugging Face Standard Medium 1 4GB Apache 2.0 2 Igama lefayile
GLM-TTS Zhipu AI Standard Medium 2 4GB GLM-4 License 2 Igama lefayile
IndexTTS-2 Index Team Standard Medium 2 4GB Bilibili Model License 2 Igama lefayile
Spark TTS SparkAudio Standard Medium 2 4GB CC BY-NC-SA 4.0 2 Igama lefayile
GPT-SoVITS RVC-Boss Standard Slow 4 6GB MIT 2 Igama lefayile
Orpheus Canopy Labs Standard Medium 1 4GB Llama 3.2 Community 2 Igama lefayile
Chatterbox Resemble AI Premium Medium 1 4GB MIT 4 Igama lefayile
Tortoise TTS James Betker Premium Slow 1 8GB Apache 2.0 4 Igama lefayile
StyleTTS 2 Columbia University Premium Medium 1 4GB MIT 4 Igama lefayile
OpenVoice MyShell.ai / MIT Premium Medium 8 4GB MIT 4 Igama lefayile
Qwen3 TTS Alibaba (Qwen) Standard Medium 10 7GB Apache 2.0 2 Igama lefayile
Sesame CSM Sesame Premium Slow 1 8GB Apache 2.0 4 Igama lefayile
Kitten TTS KittenML Free Fast 1 0GB Apache 2.0 Ekhululekileyo Igama lefayile

I-AI epheleleyo kakhulu yoMbhalo ukuya kwi-Speech Platform

Kutheni ukhetha i-TTS.ai yoMbhalo ukuya kuSpeech?

I-TTS.ai idibanisa iimodyuli ezilungileyo zehlabathi ezivulekileyo zomthombo wombhalo-ukuthetha kwinkqubo enye, elula ukuyisebenzisa. Ngokungafaniyo neenkonzo ezisemthethweni ezikutshixa kwinjini yesandi enye, i-TTS.ai ikunika ukufikelela kwiimodyuli ezingaphezu kwe-20 ezivela kwiilaboratori zophando eziphambili kubandakanya iCoqui, iMyShell, iAmphion, iNVIDIA, iSuno, iHuggingFace, iYunivesithi yaseTsinghua, kunye nezinye.

Imodeli nganye i open source phantsi kwe MIT, Apache 2. 0, okanye ilayisensi efana nayo, eqinisekisa ukuba unamalungelo orhwebo apheleleyo okusebenzisa isandi esiveliswe kwiprojekthi zakho. Nokuba ufuna ukusetyenziswa ngokukhawuleza, ukusetyenziswa kwexesha elifutshane lenkqubo okanye ukusetyenziswa kwemveliso elungileyo yestudio yeaudiobooks kunye nepodcasts, i TTS.ai inemodeli efanelekileyo yemeko nganye yokusetyenziswa.

Iimodeli ezikhululekileyo, Akukho akhawunti ifunekayo

Qala ngokuzenzekelayo ngeemodeli ezintathu ze TTS ezikhululekileyo: i Piper (ekhawulezayo kakhulu, elula), i VITS (umgangatho ophezulu we neural synthesis), kunye ne MeloTTS (inkxaso yeelwimi ezininzi). Akukho ubhaliso, akukho khadi letyala, akukho mda kwiindidi. Iimodeli ezikhululekileyo zixhasa isiNgesi kunye nezinye iilwimi ezininzi ngemveliso eziva ngathi iqhelekileyo elungele izicelo ezininzi.

Uqhubekeko olukhawulezayo lwe-GPU

Zonke iimodyuli ze-TTS zisebenza kwi-NVIDIA GPUs ezinikezelweyo ezikhawulezayo, eziqhubekayo, ezivelisa ixesha. Iimodyuli ezikhululekileyo zivelisa isandi ngaphantsi kweemizuzu emi-2. Iimodyuli eziqhelekileyo ezinje ngeKokoro, CosyVoice 2, ne Bark ziphakathi kwemizuzu emi-3-5. Iimodyuli eziphezulu zexabiso eliphezulu, ezinje nge-Tortoise ne-Chatterbox, ziqhubekeka kwimizuzu emi-5-15 ngokuxhomekeka kubude bombhalo.

30+ Iilwimi ezixhaswayo

Yenza ukuthetha kwiilwimi ezingaphezu kwe-30 kubandakanya isiNgesi, isiSpanish, isiFrentshi, isiJamani, isiTaliyani, isiPutukezi, isiTshayina, isiJapani, isiKorea, isiArabhu, isiHindi, isiRashiya, kunye nezinye ezininzi. Iimodeli ezininzi zixhasa ukwenziwa kweelwimi ezingaphezulu, oko kuthetha ukuba ungavelisa ukuthetha kwiilwimi ezingaphezulu kwelizwi elibhaliweyo. I-CosyVoice 2 ne GPT-SoVITS zigqibelele ekukloneni kwelizwi elingaphezulu kweelwimi.

Umbhekisi phambili

I-TTS.ai ifakwe kwinkqubo yakho nge OpenAI- ehambelanayo REST API. I-endpoint enye kuzo zonke iimodeli ezingama-20+. Python, JavaScript, cURL, kunye ne-Go SDKs. Inkxaso yokusasaza yenkqubo yexesha elibonakalayo. Uqhubekeko lweqela lokwenza okuqukethwe okuphezulu. I-Webhooks yesaziso se-async. Ifumaneka kwi-Pro kunye ne-Enterprise plans.

Imibuzo ebuzwa rhoqo

Okubhaliweyo ukuya kwintlanganiso (TTS) yitekhnoloji ye-AI eguqula okubhaliweyo ukuya kwintlanganiso ethethayo enesandi esiqhelekileyo. Iimodeli ze-neural TTS zexesha elizayo ezinjengeKokoro, Chatterbox, kunye neCosyVoice 2 zisebenzisa ukufunda okunzulu ukuvelisa ingxoxo ethetha ngokucacileyo njengomuntu, nge-prosody eqhelekileyo, iimvakalelo, kunye ne-rythm.

Kuxhomekeke kwiimfuno zakho. Ukujonga kuqala ngokukhawuleza, sebenzisa iPiper okanye iMeloTTS (isimahla, ikhawulezayo). Umgangatho ophezulu, zama iKokoro okanye iCosyVoice 2 (inqanaba eliqhelekileyo). Uklonelo lwelizwi, sebenzisa iChatterbox okanye iGPT-SoVITS (ipremium). Umxholo wencoko yababini/podcast, zama iDia TTS. Imodeli nganye inezinto ezinamandla ezahlukeneyo — yenza uvavanyo ukufumana ulungelelaniso olulungileyo.

Ewe! TTS.ai ibonelela ngokubhala-ukuthetha-ukuthetha simahla ngeKokoro, Piper, VITS, kunye neMeloTTS. Akukho akhawunti ifunekayo ukuya kuthi ga kuphawu lwe-500 kunye neentlobo ezi-3 ngeyure. Bhalisa kwi-akhawunti esimahla ukuze ufumane uphawu lwe-15,000 kwaye ufike kuzo zonke iimodeli.

Iimodeli zethu ze-TTS zixhasa iilwimi ezingaphezu kwe-30 kubandakanya isiNgesi, isiSpanish, isiFrentshi, isiJamani, isiTaliyani, isiPutukezi, isiTshayina, isiJaphani, isiKorea, isiArabhu, isiRashiya, isiHindi, kunye nezinye ezininzi. Ufumaneka kweelwimi kuxhomekeke kwimodeli.

Ewe, isandi esiveliswe nge TTS.ai singasetyenziswa ngokurhweba. Zonke iimodyuli zethu zisebenzisa iilayisenisi ezivulekileyo (MIT, Apache 2. 0). Khangela iilayisenisi zemodeli nganye yeemeko ezikhethekileyo. Sicebisa ukuba ujonge iilayisenisi zemodeli ekhethekileyo oyisebenzisayo kwiprojekthi yakho.

TTS.ai ixhasa i-MP3, WAV, OGG, kunye ne-FLAC ifomati yemveliso. I-MP3 imiselwe ukudlala kwi-web. I-WAV icetyiswa ukuba iqhubekeke ngakumbi kwisandi. Ungaguqula phakathi kwefomati usebenzisa isixhobo sethu sokutshintsha isandi.

Ukuphindaphinda kwesandi kusetyenziswa i-AI ukubuyisela umva isandi esichaziweyo ukusuka kwisisampulu esifutshane sesandi (isiqhelo 5-30 imizuzwana). Layisha phezulu ushicilelo olucacileyo lwesandi esithe nkqo, kwaye iimodyuli ezinjenge Chatterbox, GPT-SoVITS, okanye OpenVoice izakwenza ukuthetha okutsha kuloo lizwi. Ubunjani buphuculwa ngesandi esicocekileyo, eside sokubonisa.

Abasebenzisi abakhululekileyo bangavelisa ukuya kuthi ga kwiimpawu ezingama-500 ngesicelo ngasinye. Abasebenzisi ababhalisiweyo banokufumana ukuya kuthi ga kwiimpawu ezingama-5,000 ngesicelo ngasinye. Kuba kubhalwe amagama angaphezulu, isandi siveliswa ngamacandelo ancinci kwaye sidityaniswe ngokuzenzekelayo. Abasebenzisi be-API bangaqhubekekisa ukuya kuthi ga kwiimpawu ezingama-10,000 ngesicelo ngasinye.

SSML (Igama elibhalwe phantsi loMbhalo woMbhalo) inkxaso itshintsha ngokwemodeli. I Piper nezinye iimodeli zixhasa ii tags ze SSML ezisisiseko zokuphumla, uxinzelelo, nolawulo lokuvakalisa. Iimodeli ngaphandle kwe SSML inkxaso, ungasebenzisa iziphumlisi eziqhelekileyo kunye nemigca yokulahleka ukuxhathisa i-prosody.

Ewe, iimodyuli ezininzi zixhasa ulungelelaniso lwesantya ukusuka kwi-0.5x ukuya kwi-2.0x. Ezinye iimodyuli ezinjenge-Bark ne-Parler zivumela ulawulo lwe-pitch ne-style. Ungamisela iiparamitha zesantya kwiqela lemimiselo ephambili okanye nge-API speed parameter.

Ewe, uqhubekeko lweqela lufumaneka nge API yethu. Ungathumela imisonto emininzi yombhalo kwi API enye okanye ushicilelo, kwaye nganye izakuqhubekeka kwaye ibuyiselwe njengefayile zesandi ezihlukileyo. Oku kulungile kwicandelo lencwadi enesandi, iinkqubo zokufunda nge-e-mail, okanye iinkqubo zencoko yababini zemidlalo.

Yenza iqhosha le-API ukusuka kwi-dashboard ye-akhawunti yakho, emva koko uthumele izicelo ze-POST kwi-REST API yethu ye-endpoint ngombhalo wakho, imodeli, kunye neeparamitha zesandi. Sibonelela ngemizekelo yekhowudi kwi-Python, i-JavaScript, kunye ne-cURL. I-API ihambelana ne-OpenAI, ngoko ke ukudityaniswa okukhoyo kusebenza ngotshintsho oluncinci.
5.0/5 (2)

Yintoni esinokuyilungisa? Ulwazi lwakho olufunyenweyo lunceda silungise iingxaki.

Qala Ukutshintsha Okubhaliweyo ukuya kuSpeech Ngoku

Dibanisa amawaka abathengisi abasebenzisa i-TTS.ai. Fumana iimpawu ezi-15,000 ezikhululekileyo nge-akhawunti entsha. Iimodeli ezi-free zifumaneka ngaphandle kokubhalisa.