I-Free AI Umbhalo ukuya kuSpeech

33+ iimodeli zomthombo ovulekileyo, 273+ ii-voices, 33+ Iinkqubo zekhompyutha

17K+
abavelisi
70K+
iindidi
33+
Imodeli ye-AI
273+
iilizwi
0/500 Iimpawu · Ubhaliso lwe-5,000 ngeminyaka → Ekhululekileyo
Uthando TTS.ai? Nceda utshele abalandeli bakho!

Yonke into oyifunayo kwi Voice AI

Izixhobo ezingaphezu kwe-30 ezixhaswa ziimodyuli ze-AI ezivulekileyo

33+ Iimodeli zesandi ze-AI

Uluhlu olupheleleyo lweemodeli ze-TTS ezivulekileyo kwinkqubo enye

KokoroKokoro Iinketho zelizwe

I-Kokoro yimodeli yombhalo-ukuthetha eneparameter ezili-82 ezili-million eyenza ungqubano oluhle ngaphezulu kweqela layo lobunzima. Nangona ubungakanani bayo buncinci, ivelisa ukuthetha okucacileyo nobucacileyo. I-Kokoro ixhasa ulwimi oluninzi oluquka isiNgesi, isiJaphani, isiTshayina, nesiKorea ngeendlela ezahlukeneyo zesandi ezicacileyo. Isebenza ngokukhawuleza kakhulu — ivelisa isandi esimalunga ne-100x ngokukhawuleza kunexesha elibonakalayo kwi-GPU.

Elungileyo ku: I-TTS esezingeni eliphezulu enexesha lokulibaziseka elincinci, iinkqubo zokudlulisa

Zama simahla

PiperPiper Iinketho zelizwe

I-Piper yinjini elula yombhalo-ukuthetha ephuhliswe yi Rhasspy esebenzisa i VITS kunye ne-larynx architectures. Isebenza ngokupheleleyo kwi CPU, iyenza ibe yindawo efanelekileyo yezixhobo zesiphelo, ulawulo lwasekhaya, kunye neenkqubo ezifuna i-offline TTS. Ngeelizwi ezingaphezu kwe-100 ezisuka kwiilwimi ezingaphezu kwe-30, i-Piper inikezela ngokuthetha okuziva ngathi kuqhelekanga kwisantya sexesha elibonakalayo nakwi-Raspberry Pi 4.

Elungileyo ku: Imboniselo yabucala ekhawulezayo, ufikelelo, kunye neenkqubo ezifakelweyo

Zama simahla

VITSVITS Iinketho zelizwe

VITS (I-Variation Inference ne-adversarial learning for end-to-end Text-to-Speech) yindlela efana ne-end-to-end TTS evelisa isandi esininzi esiqhelekileyo kunezikhokelo zenqanaba elinye. Isebenzisa i-variation inference ephuculweyo ngokuhamba okuqhelekileyo kunye nenkqubo yoqeqesho oluchaphazelayo, efumana ukuphuculwa okubalulekileyo kwindalo.

Elungileyo ku: Umbhalo-usuka-ku-ukuthetha osetyenziswa ngokubanzi nge-prosody eqhelekileyo

Zama simahla

MeloTTSMeloTTS Iinketho zelizwe

MeloTTS yi MyShell. ai yi TTS yelayibrari exhasa isiNgesi (iMelika, iBrithani, i-Indian, i-Australian), isiSpanyol, isiFrentshi, isiTshayina, isiJaphani, nesiKorea. Ikhawuleza kakhulu, iqhubekekisa umbhalo kwisantya esifutshane sexesha elibonakalayo kwi CPU kuphela. MeloTTS icwangciswe ukusetyenziswa kokwenza imveliso kwaye ixhasa zombini i CPU ne GPU inference.

Elungileyo ku: Iinkqubo zokuvelisa ezifuna i-TTS ekhawulezayo, eneelwimi ezininzi

Zama simahla

Kani TTS 2Kani TTS 2 Iinketho zelizwe

Kani- TTS- 2 ngu NineNineSix yimodeli yeparameter ye-400M encinci kakhulu eyenziwe kwi-Liquid AI LFM2 backbone ene-NVIDIA NanoCodec. Isebenza kwi-3GB VRAM kuphela kwaye ivelisa ~10 imizuzwana yokuthetha kwi ~2 imizuzwana kwi-A100 (RTF 0. 2). Ukhupho lwabucala lwangoku luya kuzisa i-English- kuphela `kani- tts- 2- en` indawo yokukhangela kwaye ayifumanisi i-speaker- embedding hook efunekayo ukucloning kwelizwi — sebenzisa i-Chatterbox / IndexTTS2 / F5- TTS ukucloning, okanye i-Kokoro / MeloTTS ye-non- English.

Elungileyo ku: Uhlobo lwesiNgesi esikhawulezayo kwi-VRAM ephantsi yehardware, iimboniselo ezikhawulezayo

Zama simahla

OuteTTSOuteTTS Iinketho zelizwe

OuteTTS iqhuba iimodeli ezinkulu zolwimi ngemisebenzi yokubhala-ukuze-uthethe ngelixa igcina uyilo oluphambili. Ixhasa ii-backends ezininzi kubandakanya i-lama.cpp (CPU/GPU), Ukutsala i-Face Transformers, ExLlamaV2, VLLM, naphi na ukuqonda kwebrowser nge-Transformers.js. Iimpawu zokuklona kwelizwi elingenanto-eyenziweyo ngeeprofayili zomthumeli ezigcinwe njenge-JSON.

Elungileyo ku: Unikezelo lwe-edge, i-TTS esekelwe kwi-browser, imigangatho ephantsi-yomthombo

Zama simahla

Pocket TTSPocket TTS Iinketho zelizwe

I Pocket TTS ngu Kyutai (abavelisi be Moshi) yimodeli yombhalo- ukuya- ku- kuthetha encinci eneparameter ye 100M eyenza ubunzima bayo. Isebenza kakuhle kwi CPU, ixhasa ukuklona kwesandi esingenanto ukusuka kwisampuli yesandi, kwaye ivelisa ulwimi oluzimeleyo. Ubungakanani bemodeli encinci yenza ukuba ibe yindawo efanelekileyo yokubekwa kwesiphelo kunye nemeko- bume ephantsi yecebo.

Elungileyo ku: Unikezelo olusezantsi, i CPU- kuphela iimeko- bume, ukuclona kwelizwi ngokukhawuleza

Zama simahla

Kitten TTSKitten TTS Iinketho zelizwe

I Kitten TTS yi KittenML imodeli yombhalo- ukuya- ku- kuthetha elula kakhulu eyenziwe kwi ONNX. Ngeempawu ezisuka kwi 15M ukuya kwi 80M (25- 80 MB kwi diski), inikezela ngesandi esiphezulu sesandi kwi CPU ngaphandle kokufuna i GPU. Iimpawu 8 ezifakwe ngaphakathi kwesandi, unikezelo lwesandi olucwangcisiweyo, kunye noshicilelo ngaphambili lombhalo ofakwe ngaphakathi kwamanani, iimarike zemali, kunye neeyunithi. Ilungele unikezelo lwesiphelo kunye nesicelo esiphantsi se latency.

Elungileyo ku: I-TTS ekhawulezayo elula, unikezelo lwesiphelo, iinkqubo eziphantsi kokungalindelekanga

Zama simahla

Ming-Omni TTSMing-Omni TTS Iinketho zelizwe

Ming- omni- tts- 0. 5B yi inclusionAI imodeli yokuthetha elula ye- omni- modal eyenziwe kwi- BailingMM eqinileyo ye- backbone ene- Patch- by- Patch flow- matching audio decoder. Inikezela nge- 44. 1kHz output (ecaleni kwe CD quality), ixhasa ukuklona kwelizwi eli- zero- shot ukusuka kwi- 3 + yesibini ubhekiso, kwaye iquka ulawulo olungaphakathi lweemvakalelo / iilwimi / i- BGM nge JSON imiyalelo. Ukuzinzisa okulungileyo - 0. 83% WER kwiimpawu ze- Chinese benchmarks.

Elungileyo ku: Ukuthetha ngeelwimi ezimbini ngokunyanisekileyo, ukumela ilizwi elilawulwa ngumnqweno, imixholo yencwadi yesandi yaseTshayina

Zama simahla

MOSS-TTS NanoMOSS-TTS Nano Iinketho zelizwe

MOSS-TTS-Nano-100M yi OpenMOSS's compact 100M-parameter variant ye MOSS-TTS family, isebenzisa uyilo lwexesha lokulibazisa-lokuguqula. Ithengisa umgangatho ophezulu wemodeli ye 8B ye ~80x ngaphantsi kwesisindo kunye nokunciphisa okuphezulu kwe-VRAM ngesicelo, yenza ukuba ilungele unikezelo olukhululekileyo kunye nonikezelo oluphezulu. Ilingana ne-20-language reach.

Elungileyo ku: I-TTS ekhululekileyo, ukwenziwa kwevolumu ephezulu, ukusetyenziswa okunxibelelanayo okuphantsi

Zama simahla

BarkBark Emiselweyo

Imodeli yombhalo-ukuya-kwisandi esekelwe kwi-transformer evelisa ukuthetha okunyanisekileyo, umculo, kunye neziphumo zesandi.

Umbhekisi phambili: Suno · Ilayisensi: MIT

Zama kwakhona

Bark SmallBark Small Emiselweyo

Uguqulelo olusezantsi lwe Bark olunolwazi olukhawulezayo nokusetyenziswa okuphantsi kovimba wolwazi.

Umbhekisi phambili: Suno · Ilayisensi: MIT

Zama kwakhona

CosyVoice 2CosyVoice 2 Emiselweyo

I-Alibaba's scalable streaming TTS ene-human-parity naturalness kunye ne-zero-near latency.

Umbhekisi phambili: Alibaba (Tongyi Lab) · Ilayisensi: Apache 2.0

Zama kwakhona

Dia TTSDia TTS Emiselweyo

Imodeli yokudala ingxoxo yomthumeli-omninzi eyenza ingxoxo eqhelekileyo phakathi kwamathumeli.

Umbhekisi phambili: Nari Labs · Ilayisensi: Apache 2.0

Zama kwakhona

Parler TTSParler TTS Emiselweyo

Ichaza ilizwi ofuna ngayo kwilwimi oluqhelekileyo kwaye i-Parler ivelise ukuthetha okuhambelanayo.

Umbhekisi phambili: Hugging Face · Ilayisensi: Apache 2.0

Zama kwakhona

IndexTTS-2IndexTTS-2 Emiselweyo

I-TTS engapheliyo ene-fine-grained emotional control kunye nokubonisa okuphezulu.

Umbhekisi phambili: Index Team · Ilayisensi: Bilibili Model License

Zama kwakhona

Spark TTSSpark TTS Emiselweyo

Uklone lwelizwi le TTS ngeemvakalelo ezilawulwayo kunye nesitayile sokuthetha ngeempendulo.

Umbhekisi phambili: SparkAudio · Ilayisensi: CC BY-NC-SA 4.0

Zama kwakhona

GPT-SoVITSGPT-SoVITS Emiselweyo

Ilizwi elincinci-eliqhutywa lokuklonya i-TTS ephindayo nayiphi na ilizwi ukusuka kwimizuzu emihlanu kuphela yesandi.

Umbhekisi phambili: RVC-Boss · Ilayisensi: MIT

Zama kwakhona

OrpheusOrpheus Emiselweyo

Imodeli ye-TTS evakalelwa ngamandla enqanaba lomuntu eqeqeshwe kwi-100K yeeyure zedatha yokuthetha.

Umbhekisi phambili: Canopy Labs · Ilayisensi: Llama 3.2 Community

Zama kwakhona

Qwen3 TTSQwen3 TTS Emiselweyo

I-Alibaba's multilingual TTS eneelizwi ezimiselweyo kunye noyilo lwelizwi ukusuka kumbhalo.

Umbhekisi phambili: Alibaba (Qwen) · Ilayisensi: Apache 2.0

Zama kwakhona

VieNeu-TTS-v2VieNeu-TTS-v2 Emiselweyo

Name=IsiVietnamese + isiNgesi ikhowudi- yokutshintshela i TTS ngeelizwi ezimiselweyo ezisi-7 nelo lizwi elilinganayo. CPU kuphela, akukho GPU ifunekayo. Name

Umbhekisi phambili: Phạm Nguyễn Ngọc Bảo · Ilayisensi: Apache 2.0

Zama kwakhona

Chatterbox TurboChatterbox Turbo Emiselweyo

Ibhokisi yencoko yababini ekhawulezayo ene sub-200ms latency kunye nee-tags zeparalinguistic zoluvo, ukuphefumla, kunye nezinye izinto.

Umbhekisi phambili: Resemble AI · Ilayisensi: MIT

Zama kwakhona

VoxCPMVoxCPM Emiselweyo

I-Tokenizer-free TTS ivelisa i-44.1kHz yesandi ngemeko-bume eyaziyo iparagraph consistency.

Umbhekisi phambili: OpenBMB · Ilayisensi: Apache 2.0

Zama kwakhona

VibeVoiceVibeVoice Emiselweyo

Imodeli ye-Microsoft yezinto eziqulethe i-multi-speaker ezifana nepodcasts kunye neencwadi zesandi.

Umbhekisi phambili: Microsoft · Ilayisensi: MIT

Zama kwakhona

CosyVoice3CosyVoice3 Emiselweyo

Uhlobo olulandelayo lwe-TTS olunolwazi oluninzi olunokuhamba-hamba, ulawulo lweemvakalelo, kunye nokuklonywa kwelizwi elingekhoyo.

Umbhekisi phambili: Alibaba (FunAudioLLM) · Ilayisensi: Apache 2.0

Zama kwakhona

NAMAA Saudi TTSNAMAA Saudi TTS Emiselweyo

I-TTS yokuqala evulekileyo ye-Saudi-Arabic. I-Saudi yendawo engqongileyo enesandi se-Chatterbox-quality cloning.

Umbhekisi phambili: NAMAA Space · Ilayisensi: MIT

Zama kwakhona

Darwin TTSDarwin TTS Emiselweyo

I-Qwen3-TTS efana ne-cross-modal ene-FFN weights edityaniswa kwi-Qwen3-1.7B imodeli ye-language ye-multilingual cloning ecacileyo.

Umbhekisi phambili: FINAL-Bench · Ilayisensi: Apache 2.0

Zama kwakhona

MOSS-TTSDMOSS-TTSD Emiselweyo

Imodeli yoqhubekeko lwencoko yababini yomthumeli-ninzi — yenza i-podcast-style conversations nge-5 speakers kunye nemizuzu engama-60 yesandi esihambelanayo.

Umbhekisi phambili: OpenMOSS · Ilayisensi: Apache 2.0

Zama kwakhona

ChatterboxChatterbox Ixabiso eliphezulu

Uhlobo olutsha lwesandi esingena-nto esifana nesandi esilawulwa ngumnqweno ovela kwiResemble AI.

Ubunjani:

Zama kwakhona

Tortoise TTSTortoise TTS Ixabiso eliphezulu

Umbhalo-ukuthetha-ngezwi oluninzi olujolise kwixabiso kunye noyilo oluya ezantsi ngokuzenzekelayo.

Ubunjani:

Zama kwakhona

StyleTTS 2StyleTTS 2 Ixabiso eliphezulu

Umgangatho womntu-umbhalo-ukuthetha-ukuthetha ngokusasaza isimbo kunye noqeqesho oluchaseneyo.

Ubunjani:

Zama kwakhona

OpenVoiceOpenVoice Ixabiso eliphezulu

Uklonelo lwesandi olukhawulezayo nolawulo oluthe kratya kwindlela, imvakalelo, nesiqhelo.

Ubunjani:

Zama kwakhona

Sesame CSMSesame CSM Ixabiso eliphezulu

Imodeli yokuthetha-thethana eyenza unxibelelwano oluqhelekileyo ngexesha elifanelekileyo kunye nengqondo.

Ubunjani:

Zama kwakhona

CosyVoice 2CosyVoice 2

I-Alibaba's scalable streaming TTS ene-human-parity naturalness kunye ne-zero-near latency.

Iilwimi: en, zh, ja, ko, fr, de, it, es

Ilizwi lika-Clone

IndexTTS-2IndexTTS-2

I-TTS engapheliyo ene-fine-grained emotional control kunye nokubonisa okuphezulu.

Iilwimi: en, zh

Ilizwi lika-Clone

Spark TTSSpark TTS

Uklone lwelizwi le TTS ngeemvakalelo ezilawulwayo kunye nesitayile sokuthetha ngeempendulo.

Iilwimi: en, zh

Ilizwi lika-Clone

GPT-SoVITSGPT-SoVITS

Ilizwi elincinci-eliqhutywa lokuklonya i-TTS ephindayo nayiphi na ilizwi ukusuka kwimizuzu emihlanu kuphela yesandi.

Iilwimi: en, zh, ja, ko

Ilizwi lika-Clone

ChatterboxChatterbox

Uhlobo olutsha lwesandi esingena-nto esifana nesandi esilawulwa ngumnqweno ovela kwiResemble AI.

Iilwimi: en

Ilizwi lika-Clone

Tortoise TTSTortoise TTS

Umbhalo-ukuthetha-ngezwi oluninzi olujolise kwixabiso kunye noyilo oluya ezantsi ngokuzenzekelayo.

Iilwimi: en

Ilizwi lika-Clone

OpenVoiceOpenVoice

Uklonelo lwesandi olukhawulezayo nolawulo oluthe kratya kwindlela, imvakalelo, nesiqhelo.

Iilwimi: en, zh, ja, ko, fr, es

Ilizwi lika-Clone

VieNeu-TTS-v2VieNeu-TTS-v2

Name=IsiVietnamese + isiNgesi ikhowudi- yokutshintshela i TTS ngeelizwi ezimiselweyo ezisi-7 nelo lizwi elilinganayo. CPU kuphela, akukho GPU ifunekayo. Name

Iilwimi: vi, en

Ilizwi lika-Clone

Chatterbox TurboChatterbox Turbo

Ibhokisi yencoko yababini ekhawulezayo ene sub-200ms latency kunye nee-tags zeparalinguistic zoluvo, ukuphefumla, kunye nezinye izinto.

Iilwimi: en

Ilizwi lika-Clone

VoxCPMVoxCPM

I-Tokenizer-free TTS ivelisa i-44.1kHz yesandi ngemeko-bume eyaziyo iparagraph consistency.

Iilwimi: en, zh

Ilizwi lika-Clone

OuteTTSOuteTTS

I-LLM-based TTS esebenza kwi-CPU, GPU, okanye kwi-browser nge-lama.cpp ne-Transformers.js.

Iilwimi: en

Ilizwi lika-Clone

Pocket TTSPocket TTS

Imodeli elula yeparamitha ye-100M ye-Kyutai enesandi esifana nesona esivela kwisikhokelo esifanayo.

Iilwimi: en, fr

Ilizwi lika-Clone

CosyVoice3CosyVoice3

Uhlobo olulandelayo lwe-TTS olunolwazi oluninzi olunokuhamba-hamba, ulawulo lweemvakalelo, kunye nokuklonywa kwelizwi elingekhoyo.

Iilwimi: en, zh, ja, ko, de, es, fr, it, ru

Ilizwi lika-Clone

NAMAA Saudi TTSNAMAA Saudi TTS

I-TTS yokuqala evulekileyo ye-Saudi-Arabic. I-Saudi yendawo engqongileyo enesandi se-Chatterbox-quality cloning.

Iilwimi: ar

Ilizwi lika-Clone

Darwin TTSDarwin TTS

I-Qwen3-TTS efana ne-cross-modal ene-FFN weights edityaniswa kwi-Qwen3-1.7B imodeli ye-language ye-multilingual cloning ecacileyo.

Iilwimi: en, ko, ja, zh

Ilizwi lika-Clone

MOSS-TTSDMOSS-TTSD

Imodeli yoqhubekeko lwencoko yababini yomthumeli-ninzi — yenza i-podcast-style conversations nge-5 speakers kunye nemizuzu engama-60 yesandi esihambelanayo.

Iilwimi: en, zh

Ilizwi lika-Clone

Ming-Omni TTSMing-Omni TTS

Imodeli yokuthetha elula ye-0.5B esebenzisa i-omni-modal evela kwi-inclusionAI enemveliso ethembekileyo ye-44.1kHz kunye nokuphinda-phinda kwelizwi elingekhoyo.

Iilwimi: en, zh

Ilizwi lika-Clone

MOSS-TTS NanoMOSS-TTS Nano

I-100M MOSS-TTS encinci efana ne-100M MOSS-TTS - i-architecture efanayo, 80x encinci, i-free-tier latency.

Iilwimi: en, zh, de, es, fr, ja, it, ko, ru, ar, pt

Ilizwi lika-Clone

Umbhekisi phambili-Okuqalayo API

I-REST API ehambelana ne-OpenAI. Incopho enye yesiphelo, iimodeli ezingaphezu kwe-22. Inkxaso yosasazo lwezicelo zexesha elibonakalayo.

  • Ifomati ehambelana ne-OpenAI
  • Unikezelo lwe-TTS lweenkqubo zexesha elibonakalayo
  • Uqhubekeko lweqela lomsebenzi omkhulu
  • Isaziso se Webhook
Bonisa i-API Docs
pip install ttsai npm install @ttsainpm/ttsai
Python
from tts_ai import TTSClient

client = TTSClient(api_key="sk-tts-xxx")
audio = client.generate(
    text="Hello from TTS.ai!",
    model="kokoro",
    voice="af_bella",
)
client.save(audio, "output.mp3")

Ixabiso elilula, elicacileyo

Qala ngokukhululekileyo. Ubungakanani njengoko ukhula.

Ekhululekileyo

$0

15,000 iimpawu + 5,000/imini

  • 7 iimodeli ezikhululekileyo kubandakanya iKokoro
  • 5,000 iimpawu ngenkqubo
  • Ufikelelo lwe-API luquka
Ubhaliso simahla

Isiqalisi

$9/inyanga( ii)

500,000 iimpawu/inyanga

  • Zonke iimodeli ezingaphezu kwe-22
  • 100,000 iimpawu ngenkqubo
  • I-Voice Cloning
Qala
Ethandwa kakhulu

I-Pro

$29/inyanga( ii)

2,000 iikhredithi/inyanga

  • Yonke into kwisiqalisi
  • Ufikelelo lwe-API
  • Ukuqhubekeka okuphambili
Fumana i-Pro

Imisebenzi

$99/inyanga( ii)

10,000 iikhredithi/inyanga

  • Yonke into kwi-Pro
  • I-Bulk API
  • Ufolo oluphambili
Fumana iNkqubo

Bonisa zonke iinkqubo eziquka iipakeji zophawu →

Imibuzo ebuzwa rhoqo

TTS.ai yinkqubo yesandi ye-AI epheleleyo, enikezela ngeemodeli ezingaphezu kwe-22 zokubhala-ukuthetha, ukuclona kwelizwi, ukuthetha-ukubhaliweyo, kunye neezixhobo zesandi. Zonke iimodeli zivela kwi-open source ngaphandle kokuvula umboneleli.

Ewe! TTS.ai ibonelela ngemibhalo-ukuze-ithetha ngokukhululekileyo ngeemodeli zeKokoro, Piper, VITS, kunye neMeloTTS. Akukho akhawunti ifunekayo. Bhalisa ukuze ufumane amagama angama-15,000 asimahla kwaye ufike kuzo zonke iimodeli. Iinkqubo ezihlawulwayo ziqala kwi- $9/inyanga.

Ukusebenza ngokukhawuleza, sebenzisa iKokoro okanye iPiper. Ukusebenza kakuhle, zama iCosyVoice 2 okanye iStyleTTS 2. Ukwenza ilizwi lifana, sebenzisa iChatterbox okanye iGPT-SoVITS. Unxibelelwano, sebenzisa iDia TTS. Zama iimodeli ezininzi kumbhalo ofanayo ukuthelekiswa.

Ewe. I-OpenAI-ehambelanayo REST API ye-TTS, i-STT, ukuclone kwelizwi, kunye neezixhobo zesandi. Iquka kwiplanga ngalinye kubandakanya i-free, kunye nemida yexabiso ethe yahla ngomgangatho (i-Free: 10 req/min, Lite: 20, Starter: 30, Pro: 60, Business: 300). Bona uxwebhu kwi-tts.ai/api/.

Ubunjani besandi buhluka ngokwemodeli. Iimodeli eziphezulu ezifana ne CosyVoice 2, StyleTTS 2, ne Chatterbox zivelisa ulwimi olunomgangatho ofanayo nolunobuntu obuqhelekileyo kunye novakalelo. Iimodeli ezikhululekileyo ezifana ne Kokoro zibonelela ngomgangatho olungileyo kwiziganeko ezininzi zokusetyenziswa.

TTS.ai ixhasa 30+ ulwimi kwilayibrari yemodeli. IsiNgesi sinomxhaso wemodeli obanzi kakhulu, kodwa imodeli ezifana neCosyVoice 2 iquka isiTshayina, isiJaphani, nesiKorea; iGPT-SoVITS iphatha isiTshayina, isiJaphani, isiKorea, nesiNgesi; neMeloTTS ixhasa isiNgesi, isiSpanish, isiFrentshi, isiTshayina, isiJaphani, nesiKorea.

Ewe. Zonke inkqubo ziqhutywa kwiseva yethu ekhethekileyo ye-GPU. Asigcinanga umbhalo wakho ongeniswe okanye isandi esiveliswe emva kokuthunyelwa. Iisampuli zesandi ezilayishwe phezulu zokuklonya zisetyenziswa kuphela kwintlanganiso yangoku kwaye azigcinwanga. Asiyi kudibana nedata yakho nabani na olandelayo okanye siyisebenzise ukuqeqesha iimodyuli.

Ewe. Zonke iiseshoni zesandi eziveliswe kwi-TTS.ai ziye zasetyenziswa ngokurhweba, kubandakanya i-YouTube videos, iipodcasts, iincwadi zesandi, ii-apps, izikhumbuzo, kunye neemveliso. Iimodeli zethu zivela kumbhalo ovulekileyo phantsi kwelayisensi ezivumelayo (MIT, Apache 2.0). Akukho lungelo lokushicilela okanye ukunikezelwa okufunekayo.

TTS.ai ivelisa isandi kwifomati ye WAV ngokumiselweyo umgangatho ophezulu. Ungaguqula kwi MP3, FLAC, OGG, okanye M4A usebenzisa isixhobo sethu esikhululekileyo sokutshintsha isandi. I-API ixhasa ukukhankanya ifomati yakho ekhethiweyo yemveliso ngqo kwisicelo.

Layisha phezulu isampuli yesandi esezantsi (incinci njengemizuzwana emi-5) yelizwi ofuna ukulikhupha, emva koko ubhale nawuphi na umbhalo ukuvelisa ukuthetha kulo lizwi. Iimodeli ezinjenge Chatterbox, GPT-SoVITS, kunye ne CosyVoice 2 zixhasa ukulikhupha ulwimi. Ilizwi elikhuphiweyo lithatha into, isivakalisi, kunye nesitayile sokuthetha.

Iimodeli ezikhululekileyo (iKokoro, iPiper, iVITS, iMeloTTS) azidingi i-akhawunti kwaye zibiza uphawu olupheleleyo. Iimodeli eziqhelekileyo (2,000 uphawu/1K ingxelo) ziquka iBark, iCosyVoice 2, iF5-TTS, neDia. Iimodeli eziphezulu (4,000 uphawu/1K ingxelo) ziquka iOpenVoice, iChatterbox, iStyleTTS 2, neTortoise. Iimodeli ezihlawulwayo ngokubanzi zibonelela ngomgangatho ophezulu, iingoma ezininzi, kunye nemisebenzi engaphezulu njenge-cloning yelizwi.

Ewe. I-API ixhasa uqhubekeko lweqela lokutshintsha amaninzi ombhalo kumazwi. Thumela izicelo ezininzi kwaye ubuyisele iziphumo ngokuhambelanayo usebenzisa i-UUIDs yomsebenzi. Inkqubo yeshishini ($99/mo) kunye nangaphezulu iquka unikezelo oluphambili lofolo loqhubekeko lweqela olukhawulezayo. Ilungele ukwenziwa kweencwadi zesandi, imixholo yenkqubo, kunye neprojekti enkulu yesandi.
4.1/5 (42)

Yintoni esinokuyilungisa? Ulwazi lwakho olufunyenweyo lunceda silungise iingxaki.

Qala Ukusebenzisa i-AI Voice Namhlanje

Dibanisa abavelisi, abaphuhlisi, kunye neenkampani usebenzisa i-TTS.ai