AI Te kupu ki te kōrero
Ka tahuri te kupu ki te kōrero māori me ngā tauira pūtake tūwhera 24+ AI. Ka wātea te whakamahinga, kāore e hiahiatia he pūkete.
Whāriki i tōna kupu i roto i ngā tohu SSML mō te whakahaere tika:
<speak><prosody rate="slow">Slow speech</prosody></speak>
Tāpiri i ngā tohu ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā
Ka tautuhia ngā tohutohu ā-ringa (wāhi = tohutohu):
Whakamāramatanga tauira
Piper
Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.
| kaiwhakawhanake: | Rhasspy |
| Whakawhiwhinga: | MIT |
| Āhuatanga | Fast |
| Kāwai: | |
| reo | 31 reo |
| VRAM | 0 (CPU only) |
| Ko te tāruatanga reo | Kāore i tautokona |
Ko ngā tohu mō ngā hua pai ake
- Ka whakamahia te whakawāteatanga tika mō ngā whakawāteatanga māori me ngā whakawāteatanga.
- E whakamāori ana i ngā tau me ngā whakawhāititanga mō te kōrero mārama ake.
- E tāpiri ana i ngā kōwae hei waihanga i ngā wā pōturi i waenganui i ngā rerenga
- Ka whakamahia ngā kōaro (...) mō ngā wā roa ake
- Whakamātau i te Kokoro, i te CosyVoice 2 rānei mō ngā hua tino māori
- Ka whakamahia a Dia mō te tauwhitinga kaikōrero-maha me ngā ihirangi podcast
Ko ngā utu pūtea
| Te āhua | Ko te utu mō ia pūāhua 1K |
|---|---|
| Waihoki | 0 ngā pūtea (kore te tepe) |
| Paerewa | 2 ngā pūtea / 1K ngā pūāhua |
| Whakawhiwhinga | 4 ngā pūtea / 1K ngā pūāhua |
He pēhea te mahi a AI Text-to-Speech
E toru ngā hipanga ngāwari hei waihanga i ngā kōrero ā-mahi. Kāore he mōhiotanga hangarau e hiahiatia ana.
Ka tāuru i ōna kupu
Type, paste, whakaata rānei i te kupu e hiahiatia ana e koe kia tahuri ki te kōrero. E tautoko ana ki te 5,000 ngā tohu i ia whakatupuranga mō ngā kaiwhakaari tāurunga. Ka whakamahia te kupu pūnoa, ka tāpiri rānei i ngā tohu SSML mō te whakahaere matatini i runga i te kōrero, i ngā whakapeka, me ngā whakahua.
Hiko te tauira me te reo
Ka kōwhiria mai i ngā tauira AI 24+ puta noa i ngā taumata e toru. Ka kōwhiria tētahi reo e ōrite ana ki tōna ihirangi, e kōwhiria ana i tōna reo ūnga, e whakaritea ana i te tere tākaro mai i te 0.5x ki te 2.0x, ā, ka kōwhiria e koe te āhua huaputa e manakohia ana (MP3, WAV, OGG, FLAC rānei).
Ka whakaputaina me te tangohia
Tirohia me te kaiwhakaari whāiti, tuku i roto i tōna hanga e kōwhiria ana, tārua rānei i tētahi pātahitanga tiritiri. Ka whakamahia te API mō te tukanga rōpū me te whakaurutanga ki roto i tōna rerenga mahi.
Ka whakamahia te kupu ki te kōrero
Ko te kupu-ki-whakaahua AI e huri ana i te āhua o te waihanga, te whakapaunga, me te tauwhitinga a te tangata ki ngā ihirangi oro i roto i ngā mahi maha.
Ko ngā tauira kupu ki te kōrero katoa
Ko ngā whakaritenga mōhiohio mō ia tauira AI e wātea ana i TTS.ai. Tērā te āhuatanga, te tere, te tautoko reo, me ngā āhuatanga hei kimi i te tauira tika mō tōmu kaupapa.
Kokoro
Free
Ko te Kokoro he tauira tuhi-ki-te-kōrero tauine 82 miriona e ātete ana i runga ake i tōna karaehe taumaha. Ahakoa tōna rahi iti, ka whakaputaina e ia he kōrero tino māori me te whakamārama. Ko te Kokoro e tautoko ana i ngā reo maha tae atu ki te reo Ingarihi, te reo Hapanihi, te reo Hainamana, me te reo Korean me ngā reo whakamārama maha. He tere rawa — e whakaputa ai i te oro tata ki te 100x tere ake i te wā tūturu i runga i te GPU.
Hexgrad
Apache 2.0
Fast
en, ja, zh, ko, fr, de, it, pt, es, hi, ru
1.5GB
Kāore
Waihoki
Piper
Free
Ko Piper he mīhini kupu-ki-whakaahua ngāwari i hangaia e Rhasspy e whakamahi ana i ngā hanganga VITS me te larynx. E mahi ana katoa ana i runga i te CPU, e pai ana mō ngā pūrere pae, ngā pūkaha kāinga, me ngā taupānga e hiahiatia ana he TTS kāore i te tīariari. Me ngā reo neke atu i te 100 puta noa i ngā reo 30+, e tuku ana a Piper i te kōrero māori i te tere o te wā tūturu i runga anō i te Raspberry Pi 4.
Rhasspy
MIT
Fast
en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
0 (CPU only)
Kāore
Waihoki
VITS
Free
VITS (He whakarerekētanga me te akoranga ātete mō te mutunga-ki-te mutunga o te kupu-ki-te-whakahaere) he aratuka TTS mutunga-ki-te mutunga e puta ai he pūoro māori ake i ngā tauira wāhanga-rua o nāianei, e whakaae ana ki te whakarerekētanga o te whakarerekētanga i whakanuia e ngā rerenga pūnoa me tētahi tukanga whakaakoranga ātete, e whiwhi ana i tētahi whakapainga nui i te mātauranga.
Jaehyeon Kim et al.
MIT
Fast
en, zh, ja, ko
1GB
Kāore
Waihoki
MeloTTS
Free
Ko MeloTTS e MyShell.ai he puna TTS reo maha e tautoko ana i te reo Ingarihi (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, me te Korean. He tere rawa, e mahi ana i te kupu i te tere o te wā tūturu i runga i te CPU anake. Kua hangaia a MeloTTS mō te whakamahinga whakanao, ā, e tautoko ana i te CPU me te GPU.
MyShell.ai
MIT
Fast
en, es, fr, zh, ja, ko
0.5GB (GPU optional)
Kāore
Waihoki
Bark
Standard
Ko Bark e Suno he tauira kupu-ki-rongoā i runga anō i te whakarerekētanga ka taea te whakaputa i te kōrero tino pono, i ngā reo maha, i ētahi atu oro pūoro pēnei i te pūoro, i te pōhēhētanga o te papamuri, i ngā pānga oro. Ka taea e ia te whakaputa i ngā whakawhitinga ā-waha pēnei i te māharahara, i te tūkinotanga, i te tūkinotanga. He nui ake i te 100 ngā whakaritenga kaikōrero me ngā reo 13+ e tautoko ana e Bark.
Suno
MIT
Slow
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
5GB
Kāore
2
Bark Small
Standard
He putanga iti ake o te tauira Bark ko Bark e whakawhiti ana i ētahi o ngā āhuatanga oro mō ngā tere whakahau tere ake me ngā hiahia pūmahara iti iho, e pupuri ana i te kaha o Bark ki te whakanao i te kōrero me ngā āhuatanga, te māharahara, me ngā reo maha.
Suno
MIT
Medium
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
2GB
Kāore
2
CosyVoice 2
Standard
Ko te CosyVoice 2 a Alibaba's Tongyi Lab e whiwhi ana i te āhua o te kōrero e ōrite ana ki te tangata me te pōturi iti rawa, e pai ana mō ngā taupānga wā tūturu. Ka whakamahia e ia tētahi huarahi whakarea tūturu mō te tāruatanga reo, ā, ka tautokona e ia te tāruatanga reo kore, te tāruatanga reo whakawhiti, me te whakahaere āhua o te āhua o te āhua o te āhua.
Alibaba (Tongyi Lab)
Apache 2.0
Medium
en, zh, ja, ko, fr, de, it, es
4GB
He
2
Dia TTS
Standard
Ko Dia e Nari Labs he tauira kupu-ki-rongo 1.6B i hangaia mō te whakaputa i ngā kōrero maha. Ka taea e ia te whakaputa i ngā kōrero māori i waenganui i ngā kaikōrero e rua me te whakarerekētanga tika, me te kīanga ā-āhuatanga. He tino pai te Dia mō te waihanga i ngā ihirangi āhua podcast, ngā kōrero reo reo, me te AI whakawhitiwhitinga.
Nari Labs
Apache 2.0
Medium
en
4GB
Kāore
2
Parler TTS
Standard
Ko te Parler TTS he tauira kupu-ki-rongo e whakamahi ana i ngā whakaahuatanga reo māori hei whakahaere i te kōrero i hangaia. Ehara i te kōwhiringa mai i ngā reo i whakaritea, ka whakaahuatia e koe te reo e hiahiatia ana e koe (hei tauira, "he reo wahine wera me tētahi āhuatanga British iti, e kōrero ana i te pōturi, i te mārama hoki") ā, ka whakaputaina e te Parler he kōrero e ōrite ana ki taua whakaahuatanga. Mā tēnei e āhei ai ki ngā taupānga auau.
Hugging Face
Apache 2.0
Medium
en
4GB
Kāore
2
IndexTTS-2
Standard
Ko te IndexTTS-2 he pūnaha tuhituhi-ki-te-kōrero hōhonu e tino pai ana ki te whakakotahi reo-kore me te whakahaere āhuahira-kore. Ka taea e ia te whakaputa kōrero me ngā āhuahira ā-āhuahira pēnei i te māharahara, i te pōhara, i te pōhara, i te pōhara rānei me te kore e hiahiatia he raraunga whakaakoranga ā-āhuahira. Ka whakamahia e te tauira ngā ira ā-āhuahira hei whakahaere tika i te kīanga ā-āhuahira o te kōrero i hangaia.
Index Team
Apache 2.0
Medium
en, zh
4GB
He
2
Spark TTS
Standard
Ko te Spark TTS na SparkAudio he tauira kupu-ki-whakaahua e hono ana i te tārua reo me te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua o te āhua.
SparkAudio
Apache 2.0
Medium
en, zh
4GB
He
2
GPT-SoVITS
Standard
Ko te GPT-SoVITS e whakakotahi ana i te tauira reo āhua GPT me te SoVITS (Singing Voice Inference mā te whakawhitinga me te whakakotahitanga) mō te tārua reo kaha-kore. Me te iti iho i te 5 sekone o te oro tohutoro, ka taea e ia te tārua tika i tētahi reo me te whakaputa reo hou i te wā e pupuri ana i ngā āhuatanga ahurei o te kaikōrero. He tino pai ki te kōrero me te whakakotahi reo.
RVC-Boss
MIT
Slow
en, zh, ja, ko
6GB
He
2
Orpheus
Standard
Ko Orpheus he tauira kupu-ki-whakaahua nui e whiwhi ana i te kīanga ā-āhuatanga o te tangata. I whakaakona i runga i ngā raraunga kōrero maha ake i te 100,000 wā, e tino pai ana ki te whakaputa kōrero me ngā āhuatanga māori, te whakahua, me ngā kāhua kōrero. Ka taea e Orpheus te whakaputa kōrero e kore e taea te wehe i ngā pūkete tangata.
Canopy Labs
Llama 3.2 Community
Medium
en
4GB
Kāore
2
Chatterbox
Premium
Ko te Chatterbox na Resemble AI he tauira tāruatanga oro-kore. Ka taea e ia te tārua i tētahi reo mai i tētahi tauira oro kotahi me te tika tino mōhio, kāore i te tango anake i te timbre engari ko te kāhua kōrero me ngā āhuatanga ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā-ā.
Resemble AI
MIT
Medium
en
4GB
He
4
Tortoise TTS
Premium
Ko te Tortoise TTS he pūnaha tuhituhi-ki-te-reo-maha e whakawhāiti ana i te āhua o te reo i runga i te tere. Ka whakamahia e ia te hanganga i whakaawetia e DALL-E hei waihanga i tētahi kōrero tino māori me te ōritetanga pai o te kōrero me te kaikōrero. Ahakoa he pōturi ake i ngā whirinoa maha, ka whakaputaina e te Tortoise ētahi o ngā kōrero tino mārama e wātea ana i roto i te pūnaha pūtake tūwhera.
James Betker
Apache 2.0
Slow
en
8GB
He
4
StyleTTS 2
Premium
StyleTTS 2 e whiwhi ana i te hanganga TTS taumata- tangata mā te paheko i te whakawhānuitanga o te kāhua me te whakaakoranga ātete mā te whakamahi i ngā tauira reo kōrero nui. Ka whakaputaina e ia te kōrero tino māori i waenganui i ngā tauira kaikōrero kotahi, e whakataetae ana i ngā pūkete tangata. StyleTTS 2 e whakamahi ana i te tauira kāhua i runga anō i te whakawhānuitanga hei tango i te awhe katoa o te rerekētanga o te reo tangata.
Columbia University
MIT
Medium
en
4GB
Kāore
4
OpenVoice
Premium
E āhei ana a OpenVoice e MyShell.ai ki te tārua reo tere me te whakahaere matatini i runga i te kāhua reo, i te āhua, i te āhua, i te āhua, i te wā, i te āhua. Ka taea e ia te tārua i tētahi reo mai i tētahi rīpene orooro poto me te whakaputa kōrero i ngā reo maha i te pupuri i te tuakiri o te kaikōrero. Ka mahi hoki a OpenVoice hei kaiwhakarere reo, e whakaae ana ki te huringa reo i te wā tūturu.
MyShell.ai / MIT
MIT
Medium
en, zh, ja, ko, fr, de, es, it
4GB
He
4
Qwen3 TTS
Standard
Ko Qwen3-TTS he tauira tuhi-ki-te-kōrero tauine 1.7 miriona mai i te rōpū Qwen o Alibaba. E toru ngā āhuatanga e tautoko ana i a ia: ngā reo i whakaritea i mua me te mana ā-āhuatanga (9 ngā kaikōrero), te tārua reo mai i ngā waeine 3 anake o te oro, me tētahi āhuatanga hoahoa reo motuhake e whakaahua ana i te reo e hiahiatia ana e koe i roto i te reo māori.
Alibaba (Qwen)
Apache 2.0
Medium
en, zh, ja, ko, de, fr, ru, pt, es, it
7GB
He
2
Sesame CSM
Premium
Ko te Sesame CSM (Model Speech Conversational) he tauira taurearea kotahi mano, kua hangaia hei whakaputa kōrero ā-waha. Ka tauiratia e ia ngā tauira tūturu o te kōrero tangata tae atu ki te wā whakarerekētanga, ngā urupare ā-roto, ngā urupare ā-āhuatanga, me te rerenga kōrero.
Sesame
Apache 2.0
Slow
en
8GB
Kāore
4
Kokoro
Waihoki
Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.
Hexgrad
Apache 2.0
Fast
Piper
Waihoki
Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.
Rhasspy
MIT
Fast
VITS
Waihoki
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.
Jaehyeon Kim et al.
MIT
Fast
MeloTTS
Waihoki
MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.
MyShell.ai
MIT
Fast
Bark
Paerewa
Bark by Suno is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio like music, background noise, and sound effects. It can produce nonverbal communications like laughing, sighing, and crying. Bark supports over 100 speaker presets and 13+ languages.
Suno
MIT
Slow
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
Kāore
Bark Small
Paerewa
Bark Small is a distilled version of the Bark model that trades some audio quality for significantly faster inference speeds and lower memory requirements. It retains Bark's ability to generate speech with emotions, laughter, and multiple languages.
Suno
MIT
Medium
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
Kāore
CosyVoice 2
Paerewa
CosyVoice 2 by Alibaba's Tongyi Lab achieves human-comparable speech quality with extremely low latency, making it ideal for real-time applications. It uses a finite scalar quantization approach for streaming synthesis and supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control. It outperforms many commercial TTS systems in subjective evaluations.
Alibaba (Tongyi Lab)
Apache 2.0
Medium
en, zh, ja, ko, fr, de, it, es
He
Dia TTS
Paerewa
Dia by Nari Labs is a 1.6B parameter text-to-speech model designed specifically for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia is perfect for creating podcast-style content, audiobook dialogues, and interactive conversational AI.
Nari Labs
Apache 2.0
Medium
en
Kāore
Parler TTS
Paerewa
Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.
Hugging Face
Apache 2.0
Medium
en
Kāore
IndexTTS-2
Paerewa
IndexTTS-2 is an advanced text-to-speech system that excels at zero-shot voice synthesis with fine-grained emotion control. It can generate speech with specific emotional tones like happy, sad, angry, or fearful without requiring emotion-specific training data. The model uses emotion vectors to precisely control the emotional expression of generated speech.
Index Team
Apache 2.0
Medium
en, zh
He
Spark TTS
Paerewa
Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.
SparkAudio
Apache 2.0
Medium
en, zh
He
GPT-SoVITS
Paerewa
GPT-SoVITS combines GPT-style language modeling with SoVITS (Singing Voice Inference via Translation and Synthesis) for powerful few-shot voice cloning. With as little as 5 seconds of reference audio, it can accurately clone a voice and generate new speech while preserving the speaker's unique characteristics. It excels at both speaking and singing voice synthesis.
RVC-Boss
MIT
Slow
en, zh, ja, ko
He
Orpheus
Paerewa
Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.
Canopy Labs
Llama 3.2 Community
Medium
en
Kāore
Qwen3 TTS
Paerewa
Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports three modes: preset voices with emotion control (9 speakers), voice cloning from just 3 seconds of audio, and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.
Alibaba (Qwen)
Apache 2.0
Medium
en, zh, ja, ko, de, fr, ru, pt, es, it
He
Te ripanga whakataurite tauira
| Kāhua | kaiwhakawhanake: | Te āhua | Kāwai: | Āhuatanga | reo | Ko te tāruatanga reo | VRAM | Whakawhiwhinga: | pūtea | |
|---|---|---|---|---|---|---|---|---|---|---|
| Kokoro | Hexgrad | Free | Fast | 11 | 1.5GB | Apache 2.0 | Waihoki | Ka whakamahia | ||
| Piper | Rhasspy | Free | Fast | 31 | 0 (CPU only) | MIT | Waihoki | Ka whakamahia | ||
| VITS | Jaehyeon Kim et al. | Free | Fast | 4 | 1GB | MIT | Waihoki | Ka whakamahia | ||
| MeloTTS | MyShell.ai | Free | Fast | 6 | 0.5GB (GPU optional) | MIT | Waihoki | Ka whakamahia | ||
| Bark | Suno | Standard | Slow | 13 | 5GB | MIT | 2 | Ka whakamahia | ||
| Bark Small | Suno | Standard | Medium | 13 | 2GB | MIT | 2 | Ka whakamahia | ||
| CosyVoice 2 | Alibaba (Tongyi Lab) | Standard | Medium | 8 | 4GB | Apache 2.0 | 2 | Ka whakamahia | ||
| Dia TTS | Nari Labs | Standard | Medium | 1 | 4GB | Apache 2.0 | 2 | Ka whakamahia | ||
| Parler TTS | Hugging Face | Standard | Medium | 1 | 4GB | Apache 2.0 | 2 | Ka whakamahia | ||
| IndexTTS-2 | Index Team | Standard | Medium | 2 | 4GB | Apache 2.0 | 2 | Ka whakamahia | ||
| Spark TTS | SparkAudio | Standard | Medium | 2 | 4GB | Apache 2.0 | 2 | Ka whakamahia | ||
| GPT-SoVITS | RVC-Boss | Standard | Slow | 4 | 6GB | MIT | 2 | Ka whakamahia | ||
| Orpheus | Canopy Labs | Standard | Medium | 1 | 4GB | Llama 3.2 Community | 2 | Ka whakamahia | ||
| Chatterbox | Resemble AI | Premium | Medium | 1 | 4GB | MIT | 4 | Ka whakamahia | ||
| Tortoise TTS | James Betker | Premium | Slow | 1 | 8GB | Apache 2.0 | 4 | Ka whakamahia | ||
| StyleTTS 2 | Columbia University | Premium | Medium | 1 | 4GB | MIT | 4 | Ka whakamahia | ||
| OpenVoice | MyShell.ai / MIT | Premium | Medium | 8 | 4GB | MIT | 4 | Ka whakamahia | ||
| Qwen3 TTS | Alibaba (Qwen) | Standard | Medium | 10 | 7GB | Apache 2.0 | 2 | Ka whakamahia | ||
| Sesame CSM | Sesame | Premium | Slow | 1 | 8GB | Apache 2.0 | 4 | Ka whakamahia |
Ko te pūwāhi kupu AI tino whānui ki te kōrerorero
He aha te kōwhiringa a TTS.ai mō te kupu ki te kōrero?
TTS.ai e whakakotahi ana te ao
Ko ia tauira he pūtake tūwhera i raro i te MIT, Apache 2.0, he whakaaetanga ōrite rānei, e whakaū ana i ōna mana hokohoko katoa hei whakamahi i te oro i hangaia i roto i ōna kaupapa. Mēnā e hiahiatia ana e koe he whakakotahitanga tere, māmā rānei mō ngā taupānga wā tūturu, te huaputa mātauranga rānei mō ngā pukapuka oro me ngā podcast, he tauira tika a TTS.ai mō ia take whakamahi.
Kāhua wātea, kāore he tatau e hiahiatia ana
Ka tīmata i te wā kotahi ki ngā tauira TTS wātea e toru: Piper (āhua tere, māmā), VITS (whakahaeretanga ā-ira nui), me MeloTTS (whakahaeretanga reo maha). Kāore he whakaingoatanga, kāore he kāri pūtea, kāore he tepe i runga i ngā whakatupuranga. Ko ngā tauira wātea e tautoko ana i te reo Ingarihi me ētahi atu reo maha me ngā huaputa pūoro māori e tika ana mō te nuinga o ngā taupānga.
Ka whakateretia te tukanga GPU
Ko ngā tauira TTS katoa e haere ana i runga i ngā GPU NVIDIA motuhake mō ngā wā whakawhanake tere, ōrite. Ko ngā tauira wātea e whakaputa reo ana i raro iho i te 2 sekone. Ko ngā tauira paerewa pēnei i a Kokoro, CosyVoice 2, me Bark te nuinga o te 3-5 sekone. Ko ngā tauira utu me te āhuatanga tiketike rawa, pēnei i a Tortoise me Chatterbox, e mahi ana i roto i te 5-15 sekone, i runga anō i te roanga o te kupu.
30+ reo kua tautokona
Ka whakaputa kōrero i ngā reo neke atu i te 30 tae atu ki te reo Ingarihi, Paniora, Wīwī, Tiamana, Itari, Portuguese, Hainamana, Hapanihi, Koreana, Arabic, Hindi, Rūhia, me ētahi atu. He maha ngā tauira e tautoko ana i te whakawhiti-reo, ko te tikanga ka taea e koe te whakaputa kōrero i roto i tētahi reo kāore anō kia whakaakona te reo taketake. Ko CosyVoice 2 me GPT-SoVITS e tino pai ana i te tārua reo whakawhiti-reo.
Ka whakaritea e te kaiwhakawhanake
Ka whakaurua a TTS.ai ki ōna taupānga me a tātau OpenAI-hoatu REST API. He wāhi mutunga kotahi mō ngā tauira 24+ katoa. Python, JavaScript, cURL, me Go SDKs. Whakawhiwhinga tautoko mō ngā taupānga wā tūturu. Whakaputanga rōpū mō te whakawhanaketanga ihirangi nui. Webhooks mō ngā mōhiohio async. E wātea ana ki ngā mahere Pro me Enterprise.
E pā ana ngā pātai
Ka tīmata te tahuritanga o te kupu ki te kōrero ināianei
E hono ana ki ngā mano o ngā kaiwhakanao e whakamahi ana i te TTS.ai. Ka whiwhi moni wātea 50 me tētahi pūkete hou. Ka wātea ngā tauira wātea me te kore whakaingoatanga.