AI Te kupu ki te kōrero

Ka tahuri te kupu ki roto i te kōrero māori me ngā tauira pūtake tūwhera o AI. Whai te whakamahi, kāore he tatau e hiahiatia ana.

Kua mahia e tātau Whakapā atu i tō tou reo
Whakawhanake mō te tepe o ngā tohu 5,000

Whāriki i tōna kupu i roto i ngā tohu SSML mō te whakahaere tika:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Tāpiri i ngā tohu ā-āhuatanga hei whakaawe i te tuku (he rerekē te tautoko tauira):

Ka tautuhia ngā tohutohu ā-ringa (wāhi = tohutohu):

-12 +12
0.5x 2.0x
Waihoki me Piper, VITS, MeloTTS
Ka puta tēnei te oro i waihangatia e koe. Ka kōwhiria tētahi tauira, ka tāurua te kupu, a, ka kōwhiria te Whakatū.
Kua angitu te whakaputanga oro
Waihoki i te oro Ka ngaro te pānga i roto i te 24h
E manakohia ana e TTS.ai? Whakapāpāho ki ōna hoa!

Whakamāramatanga tauira

CosyVoice3

CosyVoice3

Standard

CosyVoice3 is the latest evolution from Alibaba's FunAudioLLM team. It features bi-streaming inference with ~150ms latency, instruction-based control for emotion/speed/volume, and improved speaker similarity for zero-shot cloning. Supports 9 languages plus 18 Chinese dialects. RL-tuned variant delivers state-of-the-art prosody.

kaiwhakawhanake: Alibaba (FunAudioLLM)
Whakawhiwhinga: Apache 2.0
Āhuatanga Fast
Kāwai:
reo 9 reo
VRAM 4GB
Ko te tāruatanga reo Kua tautokona
Āhuatanga:
Bi-streaming Emotion control Voice cloning Speed/volume control Instruction following
Ko te tino pai mo:: Multilingual production TTS, real-time applications, voice cloning

Ko ngā tohu mō ngā hua pai ake

  • Ka whakamahia te whakawāteatanga tika mō ngā whakawāteatanga māori me ngā whakawāteatanga.
  • E whakamāori ana i ngā tau me ngā whakawhāititanga mō te kōrero mārama ake.
  • E tāpiri ana i ngā kōwae hei waihanga i ngā wā pōturi i waenganui i ngā rerenga
  • Ka whakamahia ngā kōaro (...) mō ngā wā roa ake
  • Whakamātau i te Kokoro, i te CosyVoice 2 rānei mō ngā hua tino māori
  • Ka whakamahia a Dia mō te tauwhitinga kaikōrero-maha me ngā ihirangi podcast

Ko ngā utu pūtea

Te āhua Ko te utu mō ia pūāhua 1K
Waihoki 0 ngā pūtea (kore te tepe)
Paerewa 2 ngā pūtea / 1K ngā pūāhua
Whakawhiwhinga 4 ngā pūtea / 1K ngā pūāhua

He pēhea te mahi a AI Text-to-Speech

E toru ngā hipanga ngāwari hei waihanga i ngā kōrero ā-mahi. Kāore he mōhiotanga hangarau e hiahiatia ana.

Hipanga 1

Ka tāuru i ōna kupu

Type, paste, whakaata rānei i te kupu e hiahiatia ana e koe kia tahuri ki te kōrero. E tautoko ana ki te 5,000 ngā tohu i ia whakatupuranga mō ngā kaiwhakaari tāurunga. Ka whakamahia te kupu pūnoa, ka tāpiri rānei i ngā tohu SSML mō te whakahaere matatini i runga i te kōrero, i ngā whakapeka, me ngā whakahua.

Hipanga 2

Hiko te tauira me te reo

Ka kōwhiria mai i ngā tauira AI 20+ puta noa i ngā taumata e toru. Ka kōwhiria he reo e ōrite ana ki tōna ihirangi, e kōwhiria ana i tōna reo ūnga, e whakarerekē ana i te tere tākaro mai i te 0.5x ki te 2.0x, me te kōwhiri i tōna āhua huaputa e manakohia ana (MP3, WAV, OGG, FLAC rānei).

Hipanga 3

Ka whakaputaina me te tangohia

Tirohia me te kaiwhakaari whāiti, tuku i roto i tōna hanga e kōwhiria ana, tārua rānei i tētahi pātahitanga tiritiri. Ka whakamahia te API mō te tukanga rōpū me te whakaurutanga ki roto i tōna rerenga mahi.

Ka whakamahia te kupu ki te kōrero

Ko te kupu-ki-whakaahua AI e huri ana i te āhua o te waihanga, te whakapaunga, me te tauwhitinga a te tangata ki ngā ihirangi oro i roto i ngā mahi maha.

Ko ngā tauira kupu ki te kōrero katoa

Ko ngā whakaritenga mōhiohio mō ia tauira AI e wātea ana i TTS.ai. Tērā te āhuatanga, te tere, te tautoko reo, me ngā āhuatanga hei kimi i te tauira tika mō tōmu kaupapa.

KokoroKokoro

Free

Ko te Kokoro he tauira tuhi-ki-te-kōrero tauine 82 miriona e ātete ana i runga ake i tōna karaehe taumaha. Ahakoa tōna rahi iti, ka whakaputaina e ia he kōrero tino māori me te whakamārama. Ko te Kokoro e tautoko ana i ngā reo maha tae atu ki te reo Ingarihi, te reo Hapanihi, te reo Hainamana, me te reo Korean me ngā reo whakamārama maha. He tere rawa — e whakaputa ai i te oro tata ki te 100x tere ake i te wā tūturu i runga i te GPU.

kaiwhakawhanake::
Hexgrad
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en, ja, zh, ko, fr, de, it, pt, es, hi, ru
VRAM:
1.5GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
Waihoki
Parameter 82M Āhua tere Whakapāpāho ngā oro He maha nga reo Mā te tautoko pāpāho
Ko te tino pai mo:: TTS whai hua nui me te ātete iti rawa, ngā taupānga rerenga

PiperPiper

Free

Ko Piper he mīhini kupu-ki-whakaahua ngāwari i hangaia e Rhasspy e whakamahi ana i ngā hanganga VITS me te larynx. E mahi ana katoa ana i runga i te CPU, e pai ana mō ngā pūrere pae, ngā pūkaha kāinga, me ngā taupānga e hiahiatia ana he TTS kāore i te tīariari. Me ngā reo neke atu i te 100 puta noa i ngā reo 30+, e tuku ana a Piper i te kōrero māori i te tere o te wā tūturu i runga anō i te Raspberry Pi 4.

kaiwhakawhanake::
Rhasspy
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
VRAM:
0 (CPU only)
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
Waihoki
E tika ana te CPU Ka taea te ārai 100+ ngā reo 30+ reo Te tautoko SSML
Ko te tino pai mo:: Ko ngā kitenga tere, ngā āheitanga, me ngā taupānga kōkuhu

VITSVITS

Free

VITS (He whakarerekētanga me te akoranga ātete mō te mutunga-ki-te mutunga o te kupu-ki-te-whakahaere) he aratuka TTS mutunga-ki-te mutunga e puta ai he pūoro māori ake i ngā tauira wāhanga-rua o nāianei, e whakaae ana ki te whakarerekētanga o te whakarerekētanga i whakanuia e ngā rerenga pūnoa me tētahi tukanga whakaakoranga ātete, e whiwhi ana i tētahi whakapainga nui i te mātauranga.

kaiwhakawhanake::
Jaehyeon Kim et al.
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en, zh, ja, ko
VRAM:
1GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
Waihoki
Ko te whakakotahitanga mutunga-ki-te mutunga Ka taea te whakamahi te tikanga māori. Āhuatanga tere He tokomaha nga kaikōrero
Ko te tino pai mo:: Huinga ahuwhānui-tuhi-ki-te-kōrero me te pūāhua māori

MeloTTSMeloTTS

Free

Ko MeloTTS e MyShell.ai he puna TTS reo maha e tautoko ana i te reo Ingarihi (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, me te Korean. He tere rawa, e mahi ana i te kupu i te tere o te wā tūturu i runga i te CPU anake. Kua hangaia a MeloTTS mō te whakamahinga whakanao, ā, e tautoko ana i te CPU me te GPU.

kaiwhakawhanake::
MyShell.ai
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en, es, fr, zh, ja, ko
VRAM:
0.5GB (GPU optional)
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
Waihoki
CPU-I tino pai He maha nga reo He maha nga kīanga Mā te whakanaotanga Waihoki iti
Ko te tino pai mo:: Ko ngā taupānga whakanao e hiahiatia ana he tere, he TTS reo maha

BarkBark

Standard

Ko Bark e Suno he tauira kupu-ki-rongoā i runga anō i te whakarerekētanga ka taea te whakaputa i te kōrero tino pono, i ngā reo maha, i ētahi atu oro pūoro pēnei i te pūoro, i te pōhēhētanga o te papamuri, i ngā pānga oro. Ka taea e ia te whakaputa i ngā whakawhitinga ā-waha pēnei i te māharahara, i te tūkinotanga, i te tūkinotanga. He nui ake i te 100 ngā whakaritenga kaikōrero me ngā reo 13+ e tautoko ana e Bark.

kaiwhakawhanake::
Suno
Whakawhiwhinga::
MIT
Āhuatanga:
Slow
Kāwai::
reo:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
5GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
Māmā ngā pānga oro Whakahauhau/whakahauhau Ko te whakatūnga pūoro 100+ ngā kaikōrero He maha nga reo
Ko te tino pai mo:: Ko ngā ihirangi oro, ngā pukapuka oro me ngā āhuatanga, ngā pānga oro

Bark SmallBark Small

Standard

He putanga iti ake o te tauira Bark ko Bark e whakawhiti ana i ētahi o ngā āhuatanga oro mō ngā tere whakahau tere ake me ngā hiahia pūmahara iti iho, e pupuri ana i te kaha o Bark ki te whakanao i te kōrero me ngā āhuatanga, te māharahara, me ngā reo maha.

kaiwhakawhanake::
Suno
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
2GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
He māmā Āhua tere ake i te rākau katoa Whakawhitiwhiti ā-hinengaro He maha nga reo
Ko te tino pai mo:: He tere te pūoro hanga i te wā he pōturi rawa te Bark katoa

CosyVoice 2CosyVoice 2

Standard

Ko te CosyVoice 2 a Alibaba's Tongyi Lab e whiwhi ana i te āhua o te kōrero e ōrite ana ki te tangata me te pōturi iti rawa, e pai ana mō ngā taupānga wā tūturu. Ka whakamahia e ia tētahi huarahi whakarea tūturu mō te tāruatanga reo, ā, ka tautokona e ia te tāruatanga reo kore, te tāruatanga reo whakawhiti, me te whakahaere āhua o te āhua o te āhua o te āhua.

kaiwhakawhanake::
Alibaba (Tongyi Lab)
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, ja, ko, fr, de, it, es
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Pāpāho Ko te tāruatanga-kore Cross-language Ka whakahaeretia te āhuahira Human-parity
Ko te tino pai mo:: Taupānga wā-tūturu, TTS whakatere, kaiāwhina reo

Dia TTSDia TTS

Standard

Ko Dia e Nari Labs he tauira kupu-ki-rongo 1.6B i hangaia mō te whakaputa i ngā kōrero maha. Ka taea e ia te whakaputa i ngā kōrero māori i waenganui i ngā kaikōrero e rua me te whakarerekētanga tika, me te kīanga ā-āhuatanga. He tino pai te Dia mō te waihanga i ngā ihirangi āhua podcast, ngā kōrero reo reo, me te AI whakawhitiwhitinga.

kaiwhakawhanake::
Nari Labs
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en
VRAM:
4GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
He tokomaha nga kaikōrero Ko te whakatūnga o te taupānga Māori te mahi hurihanga Ko te kīanga ā-hinengaro Parameter 1.6B
Ko te tino pai mo:: Podcasts, kōrerorero pukapuka oro, ihirangi kōrerorero

Parler TTSParler TTS

Standard

Ko te Parler TTS he tauira kupu-ki-rongo e whakamahi ana i ngā whakaahuatanga reo māori hei whakahaere i te kōrero i hangaia. Ehara i te kōwhiringa mai i ngā reo i whakaritea, ka whakaahuatia e koe te reo e hiahiatia ana e koe (hei tauira, "he reo wahine wera me tētahi āhuatanga British iti, e kōrero ana i te pōturi, i te mārama hoki") ā, ka whakaputaina e te Parler he kōrero e ōrite ana ki taua whakaahuatanga. Mā tēnei e āhei ai ki ngā taupānga auau.

kaiwhakawhanake::
Hugging Face
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en
VRAM:
4GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
Whakamāramatanga reo Ka whakahaeretia te reo māori Ko te hanganga reo mārō Kāore he oro i te hiahiatia
Ko te tino pai mo:: Ko ngā taupānga auau e hiahiatia ana e koe ngā āhuatanga reo ā-ringa

GLM-TTSGLM-TTS

Standard

Ko te GLM-TTS na Zhipu AI he pūnaha kupu-ki-rongo i hangaia ki runga i te hanganga Llama me te ōritetanga rerenga. E whiwhi ana i te mokatere hapa ira iti rawa i waenganui i ngā tauira TTS pūtake tūwhera, ko te tikanga ka whakaputaina e ia te kōrero tino tika. E tautoko ana a GLM-TTS i te reo Ingarihi me te reo Hainamana me te tārua reo mai i ngā tauira oro 3-10 waeine.

kaiwhakawhanake::
Zhipu AI
Whakawhiwhinga::
GLM-4 License
Āhuatanga:
Medium
Kāwai::
reo:
en, zh
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Te mokatere hapa iti rawa Ko te tārua reo He ōrite te rerenga Ka taea te whakamahi te tikanga māori.
Ko te tino pai mo:: E hiahiatia ana e ngā taupānga te tika o te kōrero nui rawa

IndexTTS-2IndexTTS-2

Standard

Ko te IndexTTS-2 he pūnaha tuhituhi-ki-te-kōrero hōhonu e tino pai ana ki te whakakotahi reo-kore me te whakahaere āhuahira-kore. Ka taea e ia te whakaputa kōrero me ngā āhuahira ā-āhuahira pēnei i te māharahara, i te pōhara, i te pōhara, i te pōhara rānei me te kore e hiahiatia he raraunga whakaakoranga ā-āhuahira. Ka whakamahia e te tauira ngā ira ā-āhuahira hei whakahaere tika i te kīanga ā-āhuahira o te kōrero i hangaia.

kaiwhakawhanake::
Index Team
Whakawhiwhinga::
Bilibili Model License
Āhuatanga:
Medium
Kāwai::
reo:
en, zh
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Ka whakahaeretia te āhuahira Zero-shot Ko nga rarangi āhuahira Whakaputanga ā-waha Ka tika te whakahaerenga
Ko te tino pai mo:: Ko ngā ihirangi whakapuaki ā-ā-waha, ngā pukapuka oro, ngā kaiāwhina ā-ariā

Spark TTSSpark TTS

Standard

Ko te Spark TTS na SparkAudio he tauira kupu-ki-whakaahua e hono ana i te tārua reo me te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua.

kaiwhakawhanake::
SparkAudio
Whakawhiwhinga::
CC BY-NC-SA 4.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Ko te tārua reo Ka whakahaeretia te āhuahira Kāhua whakahaere I runga i te pātai 5 waeine te tāruatanga
Ko te tino pai mo:: Hanganga ihirangi me ngā reo tārua me te mana ā-āhuatanga

GPT-SoVITSGPT-SoVITS

Standard

Ko te GPT-SoVITS e whakakotahi ana i te tauira reo āhua GPT me te SoVITS (Singing Voice Inference mā te whakawhitinga me te whakakotahitanga) mō te tārua reo kaha-kore. Me te iti iho i te 5 sekone o te oro tohutoro, ka taea e ia te tārua tika i tētahi reo me te whakaputa reo hou i te wā e pupuri ana i ngā āhuatanga ahurei o te kaikōrero. He tino pai ki te kōrero me te whakakotahi reo.

kaiwhakawhanake::
RVC-Boss
Whakawhiwhinga::
MIT
Āhuatanga:
Slow
Kāwai::
reo:
en, zh, ja, ko
VRAM:
6GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
5 waeine te tāruatanga Te reo whakatangitangi He iti noa iho te akoranga He nui te pono Cross-language
Ko te tino pai mo:: Ko te tārua reo, te whakakotahinga waiata, te tāruatanga reo o te kaiwhakanao ihirangi

OrpheusOrpheus

Standard

Ko Orpheus he tauira kupu-ki-whakaahua nui e whiwhi ana i te kīanga ā-āhuatanga o te tangata. I whakaakona i runga i ngā raraunga kōrero maha ake i te 100,000 wā, e tino pai ana ki te whakaputa kōrero me ngā āhuatanga māori, te whakahua, me ngā kāhua kōrero. Ka taea e Orpheus te whakaputa kōrero e kore e taea te wehe i ngā pūkete tangata.

kaiwhakawhanake::
Canopy Labs
Whakawhiwhinga::
Llama 3.2 Community
Āhuatanga:
Medium
Kāwai::
reo:
en
VRAM:
4GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
Te āhua o te āhua tangata 100K ngā wā whakaakoranga Āhuatanga māori Whakaputanga ā-waha
Ko te tino pai mo:: Ko te kōrero ā-ā-ringa nui, ngā pukapuka oro, te mahi reo.

ChatterboxChatterbox

Premium

Ko te Chatterbox na Resemble AI he tauira tāruatanga oro-kore. Ka taea e ia te tārua i tētahi reo mai i tētahi tauira oro kotahi me te tika tino mōhio, kāore i te tango anake i te timbre engari ko te kāhua kōrero me ngā āhuatanga ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā.

kaiwhakawhanake::
Resemble AI
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
4x
Ko te tāruatanga-kore Ka whakahaeretia te āhuahira He nui te pono Ka whakawhitia te kāhua Ko te tārua tauira kotahi
Ko te tino pai mo:: Ko te tārua reo mātauranga me te mana ā-āhuatanga, te hanganga ihirangi

Tortoise TTSTortoise TTS

Premium

Ko te Tortoise TTS he pūnaha tuhituhi-ki-te-reo-maha e whakawhāiti ana i te āhua o te reo i runga i te tere. Ka whakamahia e ia te hanganga i whakaawetia e DALL-E hei waihanga i tētahi kōrero tino māori me te ōritetanga pai o te kōrero me te kaikōrero. Ahakoa he pōturi ake i ngā whirinoa maha, ka whakaputaina e te Tortoise ētahi o ngā kōrero tino mārama e wātea ana i roto i te pūnaha pūtake tūwhera.

kaiwhakawhanake::
James Betker
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Slow
Kāwai::
reo:
en
VRAM:
8GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
4x
Kāwai tiketike rawa He maha nga reo Hanganga hanga DALL-E Ko te tārua reo Ko te whakarerekētanga whaiaro
Ko te tino pai mo:: He pukapuka oro, he ihirangi utu nui, he taupānga pai-tūturu

StyleTTS 2StyleTTS 2

Premium

StyleTTS 2 e whiwhi ana i te hanganga TTS taumata- tangata mā te paheko i te whakawhānuitanga o te kāhua me te whakaakoranga ātete mā te whakamahi i ngā tauira reo kōrero nui. Ka whakaputaina e ia te kōrero tino māori i waenganui i ngā tauira kaikōrero kotahi, e whakataetae ana i ngā pūkete tangata. StyleTTS 2 e whakamahi ana i te tauira kāhua i runga anō i te whakawhānuitanga hei tango i te awhe katoa o te rerekētanga o te reo tangata.

kaiwhakawhanake::
Columbia University
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en
VRAM:
4GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
4x
Tau tangata Pāpāho kāhua Ko te whakaakoranga ātete He rerekētanga māori He nui te pono
Ko te tino pai mo:: Ko te whakakotahitanga o te kaikōrero kotahi o te mātauranga, te kōrero ngaio

OpenVoiceOpenVoice

Premium

E āhei ana a OpenVoice e MyShell.ai ki te tārua reo tere me te whakahaere matatini i runga i te kāhua reo, i te āhua, i te āhua, i te āhua, i te wā, i te āhua. Ka taea e ia te tārua i tētahi reo mai i tētahi rīpene orooro poto me te whakaputa kōrero i ngā reo maha i te pupuri i te tuakiri o te kaikōrero. Ka mahi hoki a OpenVoice hei kaiwhakarere reo, e whakaae ana ki te huringa reo i te wā tūturu.

kaiwhakawhanake::
MyShell.ai / MIT
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, ja, ko, fr, de, es, it
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
4x
Ko te tārua tere Ka whakarerekētia te reo Ka whakahaeretia te āhuahira Ka whakahaeretia te āhuahira He maha nga reo
Ko te tino pai mo:: Ko te tārua reo me te whakahaere kāhua kōaro, te tahuri reo

Qwen3 TTSQwen3 TTS

Standard

Ko Qwen3-TTS he tauira tuhi-ki-te-kōrero tauine 1.7 miriona mai i te rōpū Qwen o Alibaba. E toru ngā āhuatanga e tautoko ana i a ia: ngā reo i whakaritea i mua me te mana ā-āhuatanga (9 ngā kaikōrero), te tārua reo mai i ngā waeine 3 anake o te oro, me tētahi āhuatanga hoahoa reo motuhake e whakaahua ana i te reo e hiahiatia ana e koe i roto i te reo māori.

kaiwhakawhanake::
Alibaba (Qwen)
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, ja, ko, de, fr, ru, pt, es, it
VRAM:
7GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Ko te tārua reo 9 ngā oro i whakaritea i mua He hoahoa reo mai i te kupu Ka whakahaeretia te āhuahira reo
Ko te tino pai mo:: He maha ngā ihirangi reo me te tārua reo, te hoahoa reo rānei

Sesame CSMSesame CSM

Premium

Ko te Sesame CSM (Model Speech Conversational) he tauira taurearea kotahi mano, kua hangaia hei whakaputa kōrero ā-waha. Ka tauiratia e ia ngā tauira tūturu o te kōrero tangata tae atu ki te wā whakarerekētanga, ngā urupare ā-roto, ngā urupare ā-āhuatanga, me te rerenga kōrero.

kaiwhakawhanake::
Sesame
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Slow
Kāwai::
reo:
en
VRAM:
8GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
4x
Pāpāhotanga Te wā māori Ka huri Backchannel Parameter 1B
Ko te tino pai mo:: Ko ngā kaiāwhina AI, ngā tāngata kōrero, ngā taupānga AI kōrerorero

Chatterbox TurboChatterbox Turbo

Standard

Ko te Chatterbox Turbo na Resemble AI he whakawhānuitanga tohuāhua 350M ki te Chatterbox, e tuku ana ki te tere o te wā tūturu o te 6x me te ātetetanga o te 200ms. E tautoko ana i ngā tohu paralinguistic pēnei i te [laugh], [cough], me te [chuckle] i roto i te kupu. Kei roto ko te Perth watermarking i ngā oro katoa i hangaia mō te whai i te take.

kaiwhakawhanake::
Resemble AI
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en
VRAM:
2GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Sub-200ms te ātetetanga Mā ngā tohu Paralinguistic 6x te wā tūturu Ko te tārua reo Te tohu wai
Ko te tino pai mo:: Ko ngā māngai reo wā-tūturu, he kōrero whakamārama me ngā oro māori.

ZonosZonos

Standard

Ko te Zonos v0.1 na Zyphra he tauira tohuāhua 1.6B e whakaatu ana i te mana ā-āhuatanga mārō me ngā kāwai mō te aroha, te pōharatanga, te pōharatanga, te pōharatanga, me te whakamātautau. E whakarato ana i tētahi Transformer me tētahi tāupe SSM hōu (tauira mokowā ā-kāwanatanga). I whakaakona i runga i ngā wā 200K+ o te kōrero maha me te tārua reo-kore mai i ngā waeine 10-30 o te oro tohutoro.

kaiwhakawhanake::
Zyphra
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, ja, zh, fr, de
VRAM:
6GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Ka whakahaeretia te āhuahira Ko te tārua reo Hanganga SSM He maha nga reo Whakahaere/whakahaere i te mana
Ko te tino pai mo:: Ka kōrerorerotia te kōrero me te whakahaere āhuahira, te whare taiwhanga hoahoa reo.

Dia 2Dia 2

Standard

Ko te Dia2 a Nari Labs he whakawhānuitanga-tūturu ki te Dia, e wātea ana i roto i ngā tāupe tohu 1B me te 2B. Ka tīmata ki te whakawhanake i te oro mai i ngā tohu tuatahi, e pai ai mō ngā māngai reo wā-tūturu me ngā pūwhitinga kōrero-ki-te-kōrero. E tautoko ana i te kōrerorero maha me ngā tohu [S1] / [S2] me ngā tohu paralinguistic pēnei i te (laughs), (coughs).

kaiwhakawhanake::
Nari Labs
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en
VRAM:
4GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
Ko te huaputa rerenga He tokomaha nga kaikōrero Waihoki iti Ko ngā tohu ā-reo Tae atu ki te huaputa 2 min
Ko te tino pai mo:: Kaikōrero reo wā tūturu, whakawhanakenga kōrero, taupānga whakawhitiwhitinga

VoxCPMVoxCPM

Standard

Ko te VoxCPM 1.5 na OpenBMB he tauira TTS kore tohu hou e mahi ana i roto i te mokowā tūturu ehara i te tohu motuhake. Ka whakaputaina e ia i te oro 44.1kHz, e tautoko ana i te tārua reo kore-kōrero mai i te 3-10 sekone, ā, ka pupuri i te ōritetanga puta noa i ngā wāhanga. Ka taea e te tārua reo te hoatu i tētahi reo Ingarihi ki te kōrero Hainamana, ā, ko te āhua anō.

kaiwhakawhanake::
OpenBMB
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en, zh
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
44.1kHz oro Waihoki-kore Cross-language kloning E mōhio ana ki te horopaki LoRA fine-tuning
Ko te tino pai mo:: He pūoro pūmau, he pukapuka pūoro, he ihirangi āhua roa me te ōritetanga reo

OuteTTSOuteTTS

Free

E whakaroa ana e te OuteTTS ngā tauira reo nui me ngā āheinga kupu-ki-whakaahua i te wā e pupuri ana i te hanganga taketake. E tautoko ana i ngā taupoki maha tae atu ki a llama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, ā, ko te whakawāteatanga whakangākau mā Transformers.js. He āhuahira te tārua reo kore-pōti mā ngā tātai kōrero i tiakina hei JSON.

kaiwhakawhanake::
OuteAI
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en
VRAM:
2GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
Waihoki
Ko te whakahuatanga o te CPU Whakahautanga matapihi Whakakōrero reo He maha nga papamuri Ka taea te whakahua i ngā tāurunga
Ko te tino pai mo:: Whakapapatanga Edge, TTS i runga i te whakahura, taiao rawa-iti

TADATADA

Standard

TADA (Text-Acoustic Dual Alignment) e Hume AI he tauira TTS whakahauhau e whakakore ana i ngā hallucinations mā tētahi hoahoa tapawhā hōu i hangaia ki Llama 3.2. Kei te wātea i roto i ngā tāupe 1B (English) me te 3B (maha-reo), ka tae mai a TADA ki tētahi RTF o te 0.09 — 5x tere ake i ngā tauira TTS i runga i te LLM. E tautoko ana i te 700 waeine o te horopaki oro, ā, ka whakaputaina he kōrero ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā

kaiwhakawhanake::
Hume AI
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en
VRAM:
5GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
Haumarutanga kore 5x tere ake i te LLM TTS Emotional expression 700s ngā horopaki oro Whakawaruatanga rua
Ko te tino pai mo:: High-quality hallucination-free speech, emotional expression, fast inference

VibeVoiceVibeVoice

Standard

E rua nga momo VibeVoice a Microsoft: he tauira 1.5B mō ngā ihirangi ā-rohe roa (tata ki ngā minu 90, 4 ngā kaikōrero) me tētahi tauira 0.5B o te wā tūturu mō te whakawhitinga me te ātete reo tuatahi ~200ms. Ko te momo 1.5B e tino pai ana i ngā podcast me ngā pukapuka oro me te ōritetanga o te kaikōrero i ngā whakawhitinga roa. Whakama: I tangohia e Microsoft te waehere TTS mai i te puna, ā, ko te oro i whakaputaina e whakauru ana i ngā whakawāteatanga AI.

kaiwhakawhanake::
Microsoft
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en, zh
VRAM:
4GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
2x
He tokomaha nga kaikōrero Tae noa ki te 90 min Ko te whakawhanaketanga Podcast He ōrite te kaikōrero 200ms te rerenga
Ko te tino pai mo:: Podcasts, ngā pukapuka oro, ngā ihirangi pūkōrero maha o te āhua roa

Pocket TTSPocket TTS

Free

Ko te Pocket TTS a Kyutai (ngā kaihanga o Moshi) he tauira kupu-ki-whakaahua 100M tauine-ki-whakaahua e whakarewa ana i runga ake i tōna taumahatanga. Ka mahi tika i runga i te CPU, e tautoko ana i te tārua reo kore-kōrero mai i tētahi tauira oro kotahi, ā, ka whakaputaina he kōrero māori. Ko te rahi o te tauira iti e tino pai ana mō te whakawhānui i te pito me ngā taiao rawa iti.

kaiwhakawhanake::
Kyutai
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en, fr
VRAM:
1GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
Waihoki
Parameter 100M Ko te whakahuatanga o te CPU Whakakōrero reo Ko te tārua tauira-kotahi E noho tata ana te pito
Ko te tino pai mo:: Whakapapa māmā, taiao CPU- anake, whakaruru tere o te reo

Kitten TTSKitten TTS

Free

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

kaiwhakawhanake::
KittenML
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en
VRAM:
0GB
Ko te tāruatanga reo:
Kāore
Ko te utu mō ia pūāhua 1K:
Waihoki
CPU-only inference Under 80MB model size 8 built-in voices Speed control ONNX-based 24kHz output
Ko te tino pai mo:: Fast lightweight TTS, edge deployment, low-latency applications

CosyVoice3CosyVoice3

Standard

CosyVoice3 is the latest evolution from Alibaba's FunAudioLLM team. It features bi-streaming inference with ~150ms latency, instruction-based control for emotion/speed/volume, and improved speaker similarity for zero-shot cloning. Supports 9 languages plus 18 Chinese dialects. RL-tuned variant delivers state-of-the-art prosody.

kaiwhakawhanake::
Alibaba (FunAudioLLM)
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en, zh, ja, ko, de, es, fr, it, ru
VRAM:
4GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
2x
Bi-streaming Emotion control Voice cloning Speed/volume control Instruction following
Ko te tino pai mo:: Multilingual production TTS, real-time applications, voice cloning

MOSS-TTSMOSS-TTS

Premium

MOSS-TTS from OpenMOSS supports generation of up to 1 hour of continuous speech across 20 languages. Features token-level duration control, phoneme-level pronunciation control via IPA/Pinyin, and code-switching between languages. The 8B production model delivers state-of-the-art quality with zero-shot voice cloning from reference audio.

kaiwhakawhanake::
OpenMOSS
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, de, es, fr, ja, it, hu, ko, ru, fa, ar, pl, pt, cs, da, sv, el, tr
VRAM:
16GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
4x
Ultra-long generation 20 languages Voice cloning Duration control Pronunciation control Code-switching
Ko te tino pai mo:: Audiobooks, long-form content, multilingual production

MegaTTS3MegaTTS3

Premium

MegaTTS3 from ByteDance uses a novel sparse alignment mechanism combined with a latent diffusion transformer. Features adjustable trade-off between speech intelligibility and speaker similarity for zero-shot voice cloning.

kaiwhakawhanake::
ByteDance
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Slow
Kāwai::
reo:
en, zh
VRAM:
8GB
Ko te tāruatanga reo:
He
Ko te utu mō ia pūāhua 1K:
4x
Voice cloning Adjustable similarity Cross-lingual
Ko te tino pai mo:: High-fidelity voice cloning

KokoroKokoro

Waihoki

Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.

kaiwhakawhanake::
Hexgrad
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo: en, ja, zh, ko, fr, de, it, pt, es, hi, ru
Ko te tino pai mo:: High-quality TTS with minimal latency, streaming applications

PiperPiper

Waihoki

Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.

kaiwhakawhanake::
Rhasspy
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo: en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
Ko te tino pai mo:: Quick previews, accessibility, and embedded applications

VITSVITS

Waihoki

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.

kaiwhakawhanake::
Jaehyeon Kim et al.
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo: en, zh, ja, ko
Ko te tino pai mo:: General-purpose text-to-speech with natural prosody

MeloTTSMeloTTS

Waihoki

MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.

kaiwhakawhanake::
MyShell.ai
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo: en, es, fr, zh, ja, ko
Ko te tino pai mo:: Production applications needing fast, multilingual TTS

OuteTTSOuteTTS

Waihoki

OuteTTS extends large language models with text-to-speech capabilities while preserving the original architecture. It supports multiple backends including llama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, and even browser inference via Transformers.js. Features zero-shot voice cloning through speaker profiles saved as JSON.

kaiwhakawhanake::
OuteAI
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo: en
Ko te tino pai mo:: Edge deployment, browser-based TTS, low-resource environments

Pocket TTSPocket TTS

Waihoki

Pocket TTS by Kyutai (creators of Moshi) is a compact 100M parameter text-to-speech model that punches well above its weight. It runs efficiently on CPU, supports zero-shot voice cloning from a single audio sample, and produces natural-sounding speech. The small model size makes it ideal for edge deployment and low-resource environments.

kaiwhakawhanake::
Kyutai
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo: en, fr
Ko te tino pai mo:: Lightweight deployment, CPU-only environments, quick voice cloning

Kitten TTSKitten TTS

Waihoki

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

kaiwhakawhanake::
KittenML
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo: en
Ko te tino pai mo:: Fast lightweight TTS, edge deployment, low-latency applications

BarkBark

Paerewa

Bark by Suno is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio like music, background noise, and sound effects. It can produce nonverbal communications like laughing, sighing, and crying. Bark supports over 100 speaker presets and 13+ languages.

kaiwhakawhanake::
Suno
Whakawhiwhinga::
MIT
Āhuatanga:
Slow
Kāwai::
reo:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
Ko te tāruatanga reo:
Kāore
Sound effectsLaughing/sighingMusic generation100+ speakersMultilingual
Ko te tino pai mo:: Creative audio content, audiobooks with emotion, sound effects

Bark SmallBark Small

Paerewa

Bark Small is a distilled version of the Bark model that trades some audio quality for significantly faster inference speeds and lower memory requirements. It retains Bark's ability to generate speech with emotions, laughter, and multiple languages.

kaiwhakawhanake::
Suno
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
Ko te tāruatanga reo:
Kāore
LightweightFaster than full BarkEmotional speechMultilingual
Ko te tino pai mo:: Quick creative audio when full Bark is too slow

CosyVoice 2CosyVoice 2

Paerewa

CosyVoice 2 by Alibaba's Tongyi Lab achieves human-comparable speech quality with extremely low latency, making it ideal for real-time applications. It uses a finite scalar quantization approach for streaming synthesis and supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control. It outperforms many commercial TTS systems in subjective evaluations.

kaiwhakawhanake::
Alibaba (Tongyi Lab)
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, ja, ko, fr, de, it, es
Ko te tāruatanga reo:
He
StreamingZero-shot cloningCross-lingualEmotion controlHuman-parity
Ko te tino pai mo:: Real-time applications, streaming TTS, voice assistants

Dia TTSDia TTS

Paerewa

Dia by Nari Labs is a 1.6B parameter text-to-speech model designed specifically for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia is perfect for creating podcast-style content, audiobook dialogues, and interactive conversational AI.

kaiwhakawhanake::
Nari Labs
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en
Ko te tāruatanga reo:
Kāore
Multi-speakerDialog generationNatural turn-takingEmotional expression1.6B parameters
Ko te tino pai mo:: Podcasts, audiobook dialogues, conversational content

Parler TTSParler TTS

Paerewa

Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.

kaiwhakawhanake::
Hugging Face
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en
Ko te tāruatanga reo:
Kāore
Voice descriptionNatural language controlFlexible voice creationNo preset voices needed
Ko te tino pai mo:: Creative applications where you need custom voice characteristics

GLM-TTSGLM-TTS

Paerewa

GLM-TTS by Zhipu AI is a text-to-speech system built on the Llama architecture with flow matching. It achieves the lowest character error rate among open-source TTS models, meaning it produces the most accurate pronunciation. GLM-TTS supports English and Chinese with voice cloning from 3-10 second audio samples.

kaiwhakawhanake::
Zhipu AI
Whakawhiwhinga::
GLM-4 License
Āhuatanga:
Medium
Kāwai::
reo:
en, zh
Ko te tāruatanga reo:
He
Lowest error rateVoice cloningFlow matchingNatural prosody
Ko te tino pai mo:: Applications requiring maximum pronunciation accuracy

IndexTTS-2IndexTTS-2

Paerewa

IndexTTS-2 is an advanced text-to-speech system that excels at zero-shot voice synthesis with fine-grained emotion control. It can generate speech with specific emotional tones like happy, sad, angry, or fearful without requiring emotion-specific training data. The model uses emotion vectors to precisely control the emotional expression of generated speech.

kaiwhakawhanake::
Index Team
Whakawhiwhinga::
Bilibili Model License
Āhuatanga:
Medium
Kāwai::
reo:
en, zh
Ko te tāruatanga reo:
He
Emotion controlZero-shotEmotion vectorsExpressive speechFine-grained control
Ko te tino pai mo:: Emotionally expressive content, audiobooks, virtual assistants

Spark TTSSpark TTS

Paerewa

Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.

kaiwhakawhanake::
SparkAudio
Whakawhiwhinga::
CC BY-NC-SA 4.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh
Ko te tāruatanga reo:
He
Voice cloningEmotion controlStyle controlPrompt-based5-second cloning
Ko te tino pai mo:: Content creation with cloned voices and emotional control

GPT-SoVITSGPT-SoVITS

Paerewa

GPT-SoVITS combines GPT-style language modeling with SoVITS (Singing Voice Inference via Translation and Synthesis) for powerful few-shot voice cloning. With as little as 5 seconds of reference audio, it can accurately clone a voice and generate new speech while preserving the speaker's unique characteristics. It excels at both speaking and singing voice synthesis.

kaiwhakawhanake::
RVC-Boss
Whakawhiwhinga::
MIT
Āhuatanga:
Slow
Kāwai::
reo:
en, zh, ja, ko
Ko te tāruatanga reo:
He
5-second cloningSinging voiceFew-shot learningHigh fidelityCross-lingual
Ko te tino pai mo:: Voice cloning, singing synthesis, content creator voice replication

OrpheusOrpheus

Paerewa

Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.

kaiwhakawhanake::
Canopy Labs
Whakawhiwhinga::
Llama 3.2 Community
Āhuatanga:
Medium
Kāwai::
reo:
en
Ko te tāruatanga reo:
Kāore
Human-level emotion100K hours trainingNatural emphasisExpressive speech
Ko te tino pai mo:: High-quality emotional speech, audiobooks, voice acting

Qwen3 TTSQwen3 TTS

Paerewa

Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports three modes: preset voices with emotion control (9 speakers), voice cloning from just 3 seconds of audio, and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.

kaiwhakawhanake::
Alibaba (Qwen)
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, ja, ko, de, fr, ru, pt, es, it
Ko te tāruatanga reo:
He
Voice cloning9 preset voicesVoice design from textEmotion control10 languages
Ko te tino pai mo:: Multilingual content with voice cloning or custom voice design

Chatterbox TurboChatterbox Turbo

Paerewa

Chatterbox Turbo by Resemble AI is a 350M parameter upgrade to Chatterbox, delivering up to 6x real-time speed with sub-200ms latency. It supports paralinguistic tags like [laugh], [cough], and [chuckle] directly in text. Includes Perth watermarking on all generated audio for provenance tracking.

kaiwhakawhanake::
Resemble AI
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en
Ko te tāruatanga reo:
He
Sub-200ms latencyParalinguistic tags6x real-timeVoice cloningWatermarking
Ko te tino pai mo:: Real-time voice agents, expressive speech with natural sounds

ZonosZonos

Paerewa

Zonos v0.1 by Zyphra is a 1.6B parameter model featuring fine-grained emotion control with sliders for happiness, anger, sadness, fear, and surprise. It offers both a Transformer and a novel SSM (state-space model) variant. Trained on 200K+ hours of multilingual speech with zero-shot voice cloning from 10-30 seconds of reference audio.

kaiwhakawhanake::
Zyphra
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, ja, zh, fr, de
Ko te tāruatanga reo:
He
Emotion controlVoice cloningSSM architectureMultilingualPitch/rate control
Ko te tino pai mo:: Expressive speech with emotion control, voice design studio

Dia 2Dia 2

Paerewa

Dia2 by Nari Labs is a streaming-first upgrade to Dia, available in 1B and 2B parameter variants. It begins synthesizing audio from the first few tokens, making it ideal for real-time voice agents and speech-to-speech pipelines. Supports multi-speaker dialogue with [S1]/[S2] tags and paralinguistic cues like (laughs), (coughs).

kaiwhakawhanake::
Nari Labs
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en
Ko te tāruatanga reo:
Kāore
Streaming outputMulti-speakerLow latencyParalinguistic cuesUp to 2 min output
Ko te tino pai mo:: Real-time voice agents, dialogue generation, streaming applications

VoxCPMVoxCPM

Paerewa

VoxCPM 1.5 by OpenBMB is a novel tokenizer-free TTS model that operates in continuous space rather than discrete tokens. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from 3-10 seconds, and maintains consistency across paragraphs. Cross-language cloning lets you apply an English voice to Chinese speech and vice versa.

kaiwhakawhanake::
OpenBMB
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en, zh
Ko te tāruatanga reo:
He
44.1kHz audioTokenizer-freeCross-lingual cloningContext-awareLoRA fine-tuning
Ko te tino pai mo:: High-fidelity audio, audiobooks, long-form content with voice consistency

TADATADA

Paerewa

TADA (Text-Acoustic Dual Alignment) by Hume AI is a groundbreaking TTS model that eliminates hallucinations through a novel dual alignment architecture built on Llama 3.2. Available in 1B (English) and 3B (multilingual) variants, TADA achieves an RTF of 0.09 — 5x faster than comparable LLM-based TTS models. It supports up to 700 seconds of audio context and produces emotionally expressive speech with zero hallucinations on standard benchmarks.

kaiwhakawhanake::
Hume AI
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en
Ko te tāruatanga reo:
Kāore
Zero hallucinations5x faster than LLM TTSEmotional expression700s audio contextDual alignment
Ko te tino pai mo:: High-quality hallucination-free speech, emotional expression, fast inference

VibeVoiceVibeVoice

Paerewa

VibeVoice from Microsoft generates long-form speech up to 90 minutes with support for 4 simultaneous speakers, making it ideal for podcasts and dialogues. The Realtime 0.5B variant achieves ~300ms latency for interactive use. Supports speaker tags for multi-turn dialogue generation.

kaiwhakawhanake::
Microsoft
Whakawhiwhinga::
MIT
Āhuatanga:
Fast
Kāwai::
reo:
en, zh
Ko te tāruatanga reo:
Kāore
Multi-speakerLong-form (90 min)Podcast generationDialogueLow latency
Ko te tino pai mo:: Podcasts, dialogues, long-form narration, multi-speaker content

CosyVoice3CosyVoice3

Paerewa

CosyVoice3 is the latest evolution from Alibaba's FunAudioLLM team. It features bi-streaming inference with ~150ms latency, instruction-based control for emotion/speed/volume, and improved speaker similarity for zero-shot cloning. Supports 9 languages plus 18 Chinese dialects. RL-tuned variant delivers state-of-the-art prosody.

kaiwhakawhanake::
Alibaba (FunAudioLLM)
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Fast
Kāwai::
reo:
en, zh, ja, ko, de, es, fr, it, ru
Ko te tāruatanga reo:
He
Bi-streamingEmotion controlVoice cloningSpeed/volume controlInstruction following
Ko te tino pai mo:: Multilingual production TTS, real-time applications, voice cloning

ChatterboxChatterbox

Whakawhiwhinga

Chatterbox by Resemble AI is a cutting-edge zero-shot voice cloning model. It can replicate any voice from a single audio sample with remarkable accuracy, capturing not just the timbre but also the speaking style and emotional nuances. Chatterbox also features fine-grained emotion control, allowing you to adjust the emotional tone of the generated speech independently from the voice identity.

kaiwhakawhanake::
Resemble AI
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en
Ko te tāruatanga reo:
He
VRAM:
4GB
Ko te utu mō ia pūāhua 1K:
4x
Zero-shot cloningEmotion controlHigh fidelityStyle transferSingle sample cloning
Ko te tino pai mo:: Professional voice cloning with emotional control, content creation

Tortoise TTSTortoise TTS

Whakawhiwhinga

Tortoise TTS is an autoregressive multi-voice text-to-speech system that prioritizes audio quality over speed. It uses DALL-E-inspired architecture to generate highly natural speech with excellent prosody and speaker similarity. While slower than many alternatives, Tortoise produces some of the most realistic synthetic speech available in the open-source ecosystem.

kaiwhakawhanake::
James Betker
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Slow
Kāwai::
reo:
en
Ko te tāruatanga reo:
He
VRAM:
8GB
Ko te utu mō ia pūāhua 1K:
4x
Highest qualityMulti-voiceDALL-E architectureVoice cloningAutoregressive
Ko te tino pai mo:: Audiobooks, premium content, quality-first applications

StyleTTS 2StyleTTS 2

Whakawhiwhinga

StyleTTS 2 achieves human-level TTS synthesis by combining style diffusion with adversarial training using large speech language models. It generates the most natural sounding speech among single-speaker models, rivaling human recordings. StyleTTS 2 uses diffusion-based style modeling to capture the full range of human speech variation.

kaiwhakawhanake::
Columbia University
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en
Ko te tāruatanga reo:
Kāore
VRAM:
4GB
Ko te utu mō ia pūāhua 1K:
4x
Human-levelStyle diffusionAdversarial trainingNatural variationHigh fidelity
Ko te tino pai mo:: Studio-quality single-speaker synthesis, professional narration

OpenVoiceOpenVoice

Whakawhiwhinga

OpenVoice by MyShell.ai enables instant voice cloning with granular control over voice style, emotion, accent, rhythm, pauses, and intonation. It can clone a voice from a short audio clip and generate speech in multiple languages while maintaining the speaker identity. OpenVoice also functions as a voice converter, allowing real-time voice transformation.

kaiwhakawhanake::
MyShell.ai / MIT
Whakawhiwhinga::
MIT
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, ja, ko, fr, de, es, it
Ko te tāruatanga reo:
He
VRAM:
4GB
Ko te utu mō ia pūāhua 1K:
4x
Instant cloningVoice conversionEmotion controlAccent controlMultilingual
Ko te tino pai mo:: Voice cloning with fine-grained style control, voice conversion

Sesame CSMSesame CSM

Whakawhiwhinga

Sesame CSM (Conversational Speech Model) is a 1 billion parameter model designed specifically for generating conversational speech. It models the natural patterns of human conversation including turn-taking timing, backchannel responses, emotional reactions, and conversational flow. CSM generates audio that sounds like a natural human conversation rather than synthetic speech.

kaiwhakawhanake::
Sesame
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Slow
Kāwai::
reo:
en
Ko te tāruatanga reo:
Kāore
VRAM:
8GB
Ko te utu mō ia pūāhua 1K:
4x
ConversationalNatural timingTurn-takingBackchannel1B parameters
Ko te tino pai mo:: AI assistants, chatbots, conversational AI applications

MOSS-TTSMOSS-TTS

Whakawhiwhinga

MOSS-TTS from OpenMOSS supports generation of up to 1 hour of continuous speech across 20 languages. Features token-level duration control, phoneme-level pronunciation control via IPA/Pinyin, and code-switching between languages. The 8B production model delivers state-of-the-art quality with zero-shot voice cloning from reference audio.

kaiwhakawhanake::
OpenMOSS
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Medium
Kāwai::
reo:
en, zh, de, es, fr, ja, it, hu, ko, ru, fa, ar, pl, pt, cs, da, sv, el, tr
Ko te tāruatanga reo:
He
VRAM:
16GB
Ko te utu mō ia pūāhua 1K:
4x
Ultra-long generation20 languagesVoice cloningDuration controlPronunciation controlCode-switching
Ko te tino pai mo:: Audiobooks, long-form content, multilingual production

MegaTTS3MegaTTS3

Whakawhiwhinga

MegaTTS3 from ByteDance uses a novel sparse alignment mechanism combined with a latent diffusion transformer. Features adjustable trade-off between speech intelligibility and speaker similarity for zero-shot voice cloning.

kaiwhakawhanake::
ByteDance
Whakawhiwhinga::
Apache 2.0
Āhuatanga:
Slow
Kāwai::
reo:
en, zh
Ko te tāruatanga reo:
He
VRAM:
8GB
Ko te utu mō ia pūāhua 1K:
4x
Voice cloningAdjustable similarityCross-lingual
Ko te tino pai mo:: High-fidelity voice cloning

Te ripanga whakataurite tauira

Kāhua kaiwhakawhanake: Te āhua Kāwai: Āhuatanga reo Ko te tāruatanga reo VRAM Whakawhiwhinga: pūtea
Kokoro Hexgrad Free Fast 11 1.5GB Apache 2.0 Waihoki Ka whakamahia
Piper Rhasspy Free Fast 31 0 (CPU only) MIT Waihoki Ka whakamahia
VITS Jaehyeon Kim et al. Free Fast 4 1GB MIT Waihoki Ka whakamahia
MeloTTS MyShell.ai Free Fast 6 0.5GB (GPU optional) MIT Waihoki Ka whakamahia
Bark Suno Standard Slow 13 5GB MIT 2 Ka whakamahia
Bark Small Suno Standard Medium 13 2GB MIT 2 Ka whakamahia
CosyVoice 2 Alibaba (Tongyi Lab) Standard Medium 8 4GB Apache 2.0 2 Ka whakamahia
Dia TTS Nari Labs Standard Medium 1 4GB Apache 2.0 2 Ka whakamahia
Parler TTS Hugging Face Standard Medium 1 4GB Apache 2.0 2 Ka whakamahia
GLM-TTS Zhipu AI Standard Medium 2 4GB GLM-4 License 2 Ka whakamahia
IndexTTS-2 Index Team Standard Medium 2 4GB Bilibili Model License 2 Ka whakamahia
Spark TTS SparkAudio Standard Medium 2 4GB CC BY-NC-SA 4.0 2 Ka whakamahia
GPT-SoVITS RVC-Boss Standard Slow 4 6GB MIT 2 Ka whakamahia
Orpheus Canopy Labs Standard Medium 1 4GB Llama 3.2 Community 2 Ka whakamahia
Chatterbox Resemble AI Premium Medium 1 4GB MIT 4 Ka whakamahia
Tortoise TTS James Betker Premium Slow 1 8GB Apache 2.0 4 Ka whakamahia
StyleTTS 2 Columbia University Premium Medium 1 4GB MIT 4 Ka whakamahia
OpenVoice MyShell.ai / MIT Premium Medium 8 4GB MIT 4 Ka whakamahia
Qwen3 TTS Alibaba (Qwen) Standard Medium 10 7GB Apache 2.0 2 Ka whakamahia
Sesame CSM Sesame Premium Slow 1 8GB Apache 2.0 4 Ka whakamahia
Chatterbox Turbo Resemble AI Standard Fast 1 2GB MIT 2 Ka whakamahia
Zonos Zyphra Standard Medium 5 6GB Apache 2.0 2 Ka whakamahia
Dia 2 Nari Labs Standard Fast 1 4GB Apache 2.0 2 Ka whakamahia
VoxCPM OpenBMB Standard Fast 2 4GB Apache 2.0 2 Ka whakamahia
OuteTTS OuteAI Free Fast 1 2GB Apache 2.0 Waihoki Ka whakamahia
TADA Hume AI Standard Fast 1 5GB MIT 2 Ka whakamahia
VibeVoice Microsoft Standard Fast 2 4GB MIT 2 Ka whakamahia
Pocket TTS Kyutai Free Fast 2 1GB MIT Waihoki Ka whakamahia
Kitten TTS KittenML Free Fast 1 0GB Apache 2.0 Waihoki Ka whakamahia
CosyVoice3 Alibaba (FunAudioLLM) Standard Fast 9 4GB Apache 2.0 2 Ka whakamahia
MOSS-TTS OpenMOSS Premium Medium 19 16GB Apache 2.0 4 Ka whakamahia
MegaTTS3 ByteDance Premium Slow 2 8GB Apache 2.0 4 Ka whakamahia

Ko te pūwāhi kupu AI tino whānui ki te kōrerorero

He aha te kōwhiringa a TTS.ai mō te kupu ki te kōrero?

TTS.ai e whakakotahi ana te ao

Ko ia tauira he pūtake tūwhera i raro i te MIT, Apache 2.0, he whakaaetanga ōrite rānei, e whakaū ana i ōna mana hokohoko katoa hei whakamahi i te oro i hangaia i roto i ōna kaupapa. Mēnā e hiahiatia ana e koe he whakakotahitanga tere, māmā rānei mō ngā taupānga wā tūturu, te huaputa mātauranga rānei mō ngā pukapuka oro me ngā podcast, he tauira tika a TTS.ai mō ia take whakamahi.

Kāhua wātea, kāore he tatau e hiahiatia ana

Ka tīmata i te wā kotahi ki ngā tauira TTS wātea e toru: Piper (āhua tere, māmā), VITS (whakahaeretanga ā-ira nui), me MeloTTS (whakahaeretanga reo maha). Kāore he whakaingoatanga, kāore he kāri pūtea, kāore he tepe i runga i ngā whakatupuranga. Ko ngā tauira wātea e tautoko ana i te reo Ingarihi me ētahi atu reo maha me ngā huaputa pūoro māori e tika ana mō te nuinga o ngā taupānga.

Ka whakateretia te tukanga GPU

Ko ngā tauira TTS katoa e haere ana i runga i ngā GPU NVIDIA motuhake mō ngā wā whakawhanake tere, ōrite. Ko ngā tauira wātea e whakaputa reo ana i raro iho i te 2 sekone. Ko ngā tauira paerewa pēnei i a Kokoro, CosyVoice 2, me Bark te nuinga o te 3-5 sekone. Ko ngā tauira utu me te āhuatanga tiketike rawa, pēnei i a Tortoise me Chatterbox, e mahi ana i roto i te 5-15 sekone, i runga anō i te roanga o te kupu.

30+ reo kua tautokona

Ka whakaputa kōrero i ngā reo neke atu i te 30 tae atu ki te reo Ingarihi, Paniora, Wīwī, Tiamana, Itari, Portuguese, Hainamana, Hapanihi, Koreana, Arabic, Hindi, Rūhia, me ētahi atu. He maha ngā tauira e tautoko ana i te whakawhiti-reo, ko te tikanga ka taea e koe te whakaputa kōrero i roto i tētahi reo kāore anō kia whakaakona te reo taketake. Ko CosyVoice 2 me GPT-SoVITS e tino pai ana i te tārua reo whakawhiti-reo.

Ka whakaritea e te kaiwhakawhanake

Ka whakaurua a TTS.ai ki ōna taupānga me a tātau OpenAI-hoatu REST API. He wāhi mutunga kotahi mō ngā tauira 20+ katoa. Python, JavaScript, cURL, me Go SDKs. Whakawhiwhinga tautoko mō ngā taupānga wā tūturu. Whakaputanga rōpū mō te whakawhanaketanga ihirangi nui. Webhooks mō ngā mōhiohio async. E wātea ana ki ngā mahere Pro me Enterprise.

E pā ana ngā pātai

Ko te kupu ki te kōrero (TTS) he hangarau AI e tahuri ana i te kupu tuhituhi ki te oro kōrero māori. Ko ngā tauira TTS ā-ira o nāianei pēnei i a Kokoro, Chatterbox, me CosyVoice 2 e whakamahi ana i te akoranga hōhonu hei whakanao i te reo e āhua nei he tino tangata, me te āhua o te āhua o te āhua, te āhua o te āhua, me te āhua o te āhua.

E ai ki ōna hiahia. Mō ngā tirohanga tere, ka whakamahia e Piper, MeloTTS rānei (wāhanga, tere). Mō te āhuatanga tiketike, ka whakamātautia e Kokoro, CosyVoice rānei 2 (tauine paerewa). Mō te tārua reo, ka whakamahia e Chatterbox, GPT-SoVITS rānei (whakahaere). Mō ngā ihirangi tauwhitinga / podcast, ka whakamātautia e Dia TTS. He rerekē ngā kaha o ia tauira — whakamātau ki te kimi i te pai rawa atu.

He! TTS.ai e whakarato ana i te kupu-ki-whakaahua-whakaahua me ngā tauira Kokoro, Piper, VITS, me MeloTTS. Kāore he kāwanatanga e hiahiatia ana mō ngā pūāhua tae atu ki te 500 me ngā whakatupuranga 3 i ia wā. Ka tāuru mō tētahi kāwanatanga wātea kia whiwhi ai i ngā pūtea 50 me te āheitanga ki ngā tauira katoa.

Ko a tātau tauira TTS e tautoko ana i ngā reo 30+ tae atu ki te reo Ingarihi, Paniora, Wīwī, Tiamana, Itari, Portuguese, Hainamana, Hapanihi, Korea, Arabic, Russian, Hindi, me ētahi atu.

Ināianei, ka taea te whakamahi i te oro i hangaia mā TTS.ai. Ka whakamahia e tātau ngā tauira katoa ngā whakaaetanga pūtake tūwhera (MIT, Apache 2.0). Ka tirohia ngā whakaaetanga tauira takitahi mō ngā whakaritenga tauwhāiti. E whakatūpato ana mātou ki te arotake i te whakaaetanga o te tauira tauwhāiti e whakamahia ana mō tōtou kaupapa.

E tautoko ana a TTS.ai i ngā momo huaputa MP3, WAV, OGG, me FLAC. Ko te MP3 te tūturu mō te tākaro i te Wīwī. E whakaaetia ana te WAV mō te tukatuka oro. Ka taea e koe te tahuri i waenganui i ngā momo mā te whakamahi i tātau utauta Pārere Oro.

Ko te tārua reo e whakamahi ana i te AI hei tārua i tētahi reo tauwhāiti mai i tētahi tauira orooro poto (i te nuinga o te wā 5–30 sekone). Ka whakatakina he pūkete mārama o te orooro ūnga, ā, ko ngā tauira pēnei i te Chatterbox, GPT-SoVITS, OpenVoice rānei ka waihanga i tētahi kōrero hou i roto i taua orooro. Ka pai ake te āhuatanga me te orooro tohutoro mārō.

Ka taea e ngā kaiwhakamaori wātea te waihanga tae atu ki ngā pūāhua 500 i ia tono. Ka whiwhi ngā kaiwhakamaori rārangi ki ngā pūāhua 5,000 i ia tono. Mō ngā kupu roa ake, ka waihangatia te oro i roto i ngā kōwae, ā, ka whakakotahitia ā-pūāhua. Ka taea e ngā kaiwhakamaori API te tukanga tae atu ki ngā pūāhua 10,000 i ia tono.

He rerekē te tautoko a SSML (Speech Synthesis Markup Language) i runga anō i te tauira. Ko Piper me ētahi atu tauira e tautoko ana i ngā tohu SSML taketake mō ngā tauwhāiti, ngā whakahuatanga, me te whakahaere kōrero. Mō ngā tauira kāore i te tautoko SSML taketake, ka taea e koe te whakamahi i ngā whakarārangi māori me ngā whakawhitinga raina hei whakaawe i te āhua o te kōrero.

Heoi anō, ko te nuinga o ngā tauira e tautoko ana i te whakarerekētanga tere mai i te 0.5x ki te 2.0x. Ko ētahi tauira pēnei i te Bark me te Parler e whakaae ana hoki ki te whakahaere āhua me te āhua. Ka taea e koe te whakarite i ngā tohu tere i roto i te taupuni whakaritenga hōhonu, mā te tohu tere API rānei.

Ināianei, ka wātea te tukanga rōpū puta noa i a tātau API. Ka taea e koe te tono i ngā wāhanga kupu maha i roto i tētahi kīanga API kotahi, i tētahi tuhipānui rānei, ā, ka whakamātautia ia me te hoki ki ngā pūranga oro motuhake. He tino pai tēnei mō ngā wāhanga pukapuka oro, ngā wae e-mātau, ngā tuhipānui kōrero kēmu rānei.

Ka whakaputaina he kī API mai i tōtou papatono kāwanatanga, kātahi ka tukuna ngā tono POST ki a tātau wāhi mutunga o te REST API me tōtou kupu, tauira, me ngā tohu reo. Ka whakarato rātau i ngā tauira waehere i roto i te Python, JavaScript, me te cURL. Ko te API e ōrite ana ki te OpenAI, nō reira ka mahi ngā whakaurutanga tīariari ki ngā huringa iti rawa.
5.0/5 (2)

What could we improve? Your feedback helps us fix issues.

Ka tīmata te tahuritanga o te kupu ki te kōrero ināianei

Ka hono ki ngā mano o ngā kaihanga e whakamahi ana i te TTS.ai. Ka whiwhi 15,000 ngā pūāhua wātea me tētahi pūtake hou. Ka wātea ngā tauira wātea me te kore whakaingoatanga.