He aha te kupu ki te kōrero (TTS)?
Ko te kupu ki te kōrero te hangarau e tahuri ana i te kupu tuhituhi ki roto i te oro kōrero mā te whakamahi i te mōhiotanga waihanga, mai i ngā pūoko robotic o mua ki tēnei rā.
Ko ngā ariā kī i roto i te kupu ki te kōrero
E mōhio ana ki ngā paraka hanganga o te whakakotahi kōrero o nāianei.
He aha te TTS e tohu ana
TTS e tohu ana i te kupu-ki-te-kōrero - ko te hangarau e tahuri ana i te kupu tuhituhi ki te oro kōrero mā te whakamahi i ngā reo i hangaia e te rorohiko.
He pēhea te mahi a Neural TTS
Ko te TTS o nāianei e whakamahi ana i ngā whatunga ā-ira hōhonu hei tātari i te kupu, hei matapae i ngā tauira kōrero, me te whakaputa i ngā momo ngaru oro e tino āhua tangata ana.
Ko te hītori o te whakakotahi kōrero
Mai i ngā pūnaha i runga i te ture o nga tau kotahi mano e 1960 ki ngā tau kotahi mano e 1990 ki ngā tauira ā-ira o nāianei — he pēhea te whanaketanga o TTS i roto i ngā tekau tau e ono.
Kāhua AI hōu
Ko ngā tauira o nāianei pēnei i a Kokoro, Bark, me CosyVoice 2 e whakamahi ana i ngā whakarerekētanga, i te horahanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga.
Ko ngā taupānga noa iho
Ko te TTS e whakahau ana i ngā kaipānui mata, ngā kaiārahi GPS, ngā kaiāwhina ā-waha, ngā pukapuka oro, ngā pūnaha ratonga whaiaro, ngā pūwāhi akoranga-e, me te hanganga ihirangi.
Ka tuwhera te pūtake vs. Commercial
Ko ngā tauira pūtake tūwhera (MIT, Apache 2.0) e whakarato ana i tētahi TTS wātea, ā, ko ngā ratonga hokohoko e whakarato ana i ngā APIs whakahaere me ngā SLAs me te tautoko.
TTS ngā tauira e wātea ana i TTS.ai
Mai i te tere, i te māmā ki ngā oro ā-roto mātauranga.
Kokoro
Free
Lightweight 82M parameter model delivering studio-quality speech with blazing-fast inference.
Ko te tino pai mo: State-of-the-art tauira iti - e whakaatu ana i te tawhiti kua tae mai te TTS ā-ira.
Whakamātautau Kokoro
Bark
Standard
Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.
Ko te tino pai mo: Ko te tauira i runga i te whakarerekētanga e whakaatu ana i te whakaputanga oro i tua atu i te kōrero.
Whakamātautau Bark
CosyVoice 2
Standard
Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.
Ko te tino pai mo: TTS Streaming me te pai o te tangata me te tārua kore-shot
Whakamātautau CosyVoice 2
Chatterbox
Premium
State-of-the-art zero-shot voice cloning with emotion control from Resemble AI.
Ko te tino pai mo: Ko te tārua reo-kore e whakaatu ana i te rohe o te whakakotahi reo.
Whakamātautau Chatterbox
Tortoise TTS
Premium
Multi-voice text-to-speech focused on quality with autoregressive architecture.
Ko te tino pai mo: Ko te hanganga ā-ariā e āhei ana ki te whakawhāiti i te kounga o te oro nui rawa
Whakamātautau Tortoise TTSHe pēhea te mahi a Neural TTS
E whā nga hipanga o te pūhui kōrero o nāianei.
E mōhio ana ki ngā taketake
Ko te TTS e tahuri ana i te kupu tuhituhi ki te oro kōrero. Ka whakamahia e ngā pūnaha o nāianei ngā whatunga ā-ira kua whakaakona i runga i ngā mano o ngā wā o ngā pūkete reo tangata.
E torotoro ana i ngā tauira rerekē
Ka whakamahia e ia tauira TTS tētahi hanganga rerekē (whakarerekē, whakawhānuitanga, whakarerekētanga) me ngā kaha motuhake i roto i te tere, te āhuatanga, me ngā āhuatanga.
Whakamātau i a koe
Ko te tikanga pai rawa o te mōhio ki te TTS ko te whakamahi i a ia. Whakamātau i a tātau tauira wātea i runga ake nei — whakataki i tētahi kupu me te whakarongo i roto i ngā takirua.
Kohikohi ki ōna kaupapa
Ina kitea e koe tētahi tauira e manakohia ana e koe, ka whakamahia e tātau te API hei whakauru i te TTS ki ōna taupānga, hua, i te rerenga mahi waihanga ihirangi rānei.
He hītori poto o te kupu ki te kōrero
Mai i ngā mīhini kōrero pūkaha ki ngā whatunga ā-ira.
Ko nga rā o mua (1950s–1980s)
I hoki mai te kōrero tuatahi i hangaia e te rorohiko ki te tau kotahi mano e 1961, i a IBM.
He pūnaha mōhiotia: Votrax (1970s), DECtalk (1984, i whakamahia e Stephen Hawking), Apple.
Ko te whakakotahitanga pāhono (1990s–2000s)
Ko te TTS pāhono e whakataki ana i tētahi reo tangata tūturu e kōrero ana i ngā mano o ngā pahekotanga reo, kātahi ka whakakotahi ngā wāhanga tika i te wā o te whakahaeretanga. Nā tēnei i puta ai he kōrero māori ake, engari e hiahiatia ana ngā pātengi raraunga nui ake (i te nuinga o te wā e 10–20 ngā wā o ngā whakakotahitanga mō ia reo).
I whakamahia e: AT&T Natural Voices, Nuance Vocalizer, Google Translate TTS o mua.
Tatauranga/Parameter (2000s-2010s)
Ko ngā tauira Markov Hidden (HMMs) me ngā whatunga ā-ira hōhonu i muri ake nei i whakaputaina ngā tohu kōrero (pike, roanga, āhuahira matatini) i puta mai i tētahi vokoder. I whakaaetia e tēnei te kupu kōrero ā-kore me te hanganga reo ngāwari ake, engari i te nuinga o te wā ka whakaputaina e te hipanga vokoder i tētahi \.
Tauira matua: HTS, Merlin, ngā pūnaha taketake o te DNN.
Neural TTS (2016-nāianei)
I tīmata te wā o nāianei ki te WaveNet (DeepMind, 2016), nāna i waihanga i te tauira oro mā te tauira mā te whakamahi i ngā whatunga ā-ira hōhonu. I whaia tēnei e Tacotron (Google, 2017), nāna i ako ki te mahere tika i te kupu ki ngā tauine.
Ko ngā tūpāpaku matua: WaveNet, Tacotron, FastSpeech, VITS, Bark, Kokoro.
He pēhea te mahi o te TTS neural o nāianei
Ko te hanganga i muri i ngā oro AI māori.
Tatauranga kupu me te pūnoatanga
Kua whakawāteatia te kupu rākau, ā, kua whakaritea: ka noho ngā tau hei kupu (\
Tauira Acoustic (Wāhi ki te Spectrogram)
Ko te tauira pūoro (i te nuinga o te wā he whakarerekētanga, he whatunga whakarerekē rānei) e mau ana i te raupapa pūoro me te matapae i tētahi pūoro pūoro - he whakaaturanga whakaahua o te āhua o te oro.
Vocoder (Spectrogram ki te oroihi)
Ko te vocoder e tahuri ana i te pūwhitiwhiti mel ki roto i ngā momo aho tūturu. Ko ngā vocoders o mua pēnei i te Griffin-Lim e whakaputa ana i ngā ahanoa robotic. Ko ngā vocoders neural o nāianei (HiFi-GAN, BigVGAN, Vocos) e whakanao ana i te 24kHz, i te 44.1kHz rānei o te pūoro e tango ana i ngā taipitopito o te reo māori, tae atu ki ngā oro o te hau me ngā nekeneketanga matatini.
Tauira-ki-tauira
Ko ngā tauira o nāianei pēnei i a VITS, Kokoro, me Bark e whakarerekē ana i te wāhanga-rua. Ka haere hāngai rātau mai i te kupu ki te oro i roto i tētahi whatunga ā-ira kotahi, e whakaputa ana i ngā hua māori ake me ngā taonga iti iho. Ka taea hoki e ētahi tauira (pēnei i a Bark) te whakaputa i ngā oro kāore i te kōrero, i te māharahara, i te pūoro i te taha o te kōrero.
Kua whakatauritetia ngā huarahi TTS
He pēhea te whakataurite i ngā whakatupuranga e whā o te hangarau TTS?
| Āhuatanga | Wā | Māoritanga | He āhuahira | Āhuatanga | E hiahiatia ana te raraunga |
|---|---|---|---|---|---|
| Ko te whakakotahitanga āhuahira He tauira auautanga i runga i te ture |
1960s-1990s | Kāore | |||
| Ka whakakotahitia Ko ngā wāhanga oro i whakarārangitia |
1990s-2010s | 10-20+ wā | |||
| Parameter (HMM/DNN) Kāhua kōrero tatauranga |
2000s-2016 | 1-5 ngā wā | |||
| Neural End-to-End Ko te akoranga hōhonu (VITS, Kokoro, Bark) |
2016-Ināianei | Minutes ki ngā wā |
Ko ngā taupānga noa iho o TTS
I reira ka whakamahia te kupu ki te kōrero i tēnei rā
Āhei ki te uru
E whakawhirinaki ana ngā kaipānui mata, ngā pūrere āwhina, me ngā utauta mō ngā tāngata me ngā raruraru mata, ngā raruraru akoranga rānei ki te TTS hei whakawātea i ngā ihirangi mamati ki te katoa.
Hanganga ihirangi
Ko ngā YouTubers, ngā podcasters, me ngā kaiwhakanao pāpāho pāpori e whakamahi ana i te TTS mō ngā kōrero, te kōrero, me te whakaputa ihirangi aunoa i runga i te tauine.
Ko ngā Kaiāwhina Āhuatanga
Ko Siri, Alexa, Google Assistant, me ngā tāngata kōrero o te ratonga ngaio, e whakamahi ana i te TTS hei kōrero māori i ngā urupare ki ngā whakamahinga.
E pā ana ngā pātai
E pā ana ngā pātai noa iho ki te hangarau kuputuhi ki te kōrero
E whakamātauria ana e koe te TTS hōu
Ka whakamātautau 24+ ngā tauira reo AI o te āhua o te āhua mo te wātea. Tirohia te tawhiti o te kupu ki te kōrero.