He aha te kupu ki te kōrero (TTS)?

Ko te kupu ki te kōrero te hangarau e tahuri ana i te kupu tuhituhi ki roto i te oro kōrero mā te whakamahi i te mōhiotanga waihanga, mai i ngā pūoko robotic o mua ki tēnei rā.

Hanganga hangarau Ko te hītori He pēhea te mahi Rāwhiti ā-tinana Evolution

Ko ngā ariā kī i roto i te kupu ki te kōrero

E mōhio ana ki ngā paraka hanganga o te whakakotahi kōrero o nāianei.

He aha te TTS e tohu ana

TTS e tohu ana i te kupu-ki-te-kōrero - ko te hangarau e tahuri ana i te kupu tuhituhi ki te oro kōrero mā te whakamahi i ngā reo i hangaia e te rorohiko.

He pēhea te mahi a Neural TTS

Ko te TTS o nāianei e whakamahi ana i ngā whatunga ā-ira hōhonu hei tātari i te kupu, hei matapae i ngā tauira kōrero, me te whakaputa i ngā momo ngaru oro e tino āhua tangata ana.

Ko te hītori o te whakakotahi kōrero

Mai i ngā pūnaha i runga i te ture o nga tau kotahi mano e 1960 ki ngā tau kotahi mano e 1990 ki ngā tauira ā-ira o nāianei — he pēhea te whanaketanga o TTS i roto i ngā tekau tau e ono.

Kāhua AI hōu

Ko ngā tauira o nāianei pēnei i a Kokoro, Bark, me CosyVoice 2 e whakamahi ana i ngā whakarerekētanga, i te horahanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga, i te whakarerekētanga.

Ko ngā taupānga noa iho

Ko te TTS e whakahau ana i ngā kaipānui mata, ngā kaiārahi GPS, ngā kaiāwhina ā-waha, ngā pukapuka oro, ngā pūnaha ratonga whaiaro, ngā pūwāhi akoranga-e, me te hanganga ihirangi.

Ka tuwhera te pūtake vs. Commercial

Ko ngā tauira pūtake tūwhera (MIT, Apache 2.0) e whakarato ana i tētahi TTS wātea, ā, ko ngā ratonga hokohoko e whakarato ana i ngā APIs whakahaere me ngā SLAs me te tautoko.

TTS ngā tauira e wātea ana i TTS.ai

Mai i te tere, i te māmā ki ngā oro ā-roto mātauranga.

KokoroKokoro

Free

Lightweight 82M parameter model delivering studio-quality speech with blazing-fast inference.

Fast 5/5

Ko te tino pai mo: State-of-the-art tauira iti - e whakaatu ana i te tawhiti kua tae mai te TTS ā-ira.

Whakamātautau Kokoro

BarkBark

Standard

Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.

Slow 4/5

Ko te tino pai mo: Ko te tauira i runga i te whakarerekētanga e whakaatu ana i te whakaputanga oro i tua atu i te kōrero.

Whakamātautau Bark

CosyVoice 2CosyVoice 2

Standard

Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.

Medium 5/5 Ko te tāruatanga reo

Ko te tino pai mo: TTS Streaming me te pai o te tangata me te tārua kore-shot

Whakamātautau CosyVoice 2

ChatterboxChatterbox

Premium

State-of-the-art zero-shot voice cloning with emotion control from Resemble AI.

Medium 5/5 Ko te tāruatanga reo

Ko te tino pai mo: Ko te tārua reo-kore e whakaatu ana i te rohe o te whakakotahi reo.

Whakamātautau Chatterbox

Tortoise TTSTortoise TTS

Premium

Multi-voice text-to-speech focused on quality with autoregressive architecture.

Slow 5/5 Ko te tāruatanga reo

Ko te tino pai mo: Ko te hanganga ā-ariā e āhei ana ki te whakawhāiti i te kounga o te oro nui rawa

Whakamātautau Tortoise TTS

He pēhea te mahi a Neural TTS

E whā nga hipanga o te pūhui kōrero o nāianei.

1

E mōhio ana ki ngā taketake

Ko te TTS e tahuri ana i te kupu tuhituhi ki te oro kōrero. Ka whakamahia e ngā pūnaha o nāianei ngā whatunga ā-ira kua whakaakona i runga i ngā mano o ngā wā o ngā pūkete reo tangata.

2

E torotoro ana i ngā tauira rerekē

Ka whakamahia e ia tauira TTS tētahi hanganga rerekē (whakarerekē, whakawhānuitanga, whakarerekētanga) me ngā kaha motuhake i roto i te tere, te āhuatanga, me ngā āhuatanga.

3

Whakamātau i a koe

Ko te tikanga pai rawa o te mōhio ki te TTS ko te whakamahi i a ia. Whakamātau i a tātau tauira wātea i runga ake nei — whakataki i tētahi kupu me te whakarongo i roto i ngā takirua.

4

Kohikohi ki ōna kaupapa

Ina kitea e koe tētahi tauira e manakohia ana e koe, ka whakamahia e tātau te API hei whakauru i te TTS ki ōna taupānga, hua, i te rerenga mahi waihanga ihirangi rānei.

He hītori poto o te kupu ki te kōrero

Mai i ngā mīhini kōrero pūkaha ki ngā whatunga ā-ira.

Ko nga rā o mua (1950s–1980s)

I hoki mai te kōrero tuatahi i hangaia e te rorohiko ki te tau kotahi mano e 1961, i a IBM.

He pūnaha mōhiotia: Votrax (1970s), DECtalk (1984, i whakamahia e Stephen Hawking), Apple.

Ko te whakakotahitanga pāhono (1990s–2000s)

Ko te TTS pāhono e whakataki ana i tētahi reo tangata tūturu e kōrero ana i ngā mano o ngā pahekotanga reo, kātahi ka whakakotahi ngā wāhanga tika i te wā o te whakahaeretanga. Nā tēnei i puta ai he kōrero māori ake, engari e hiahiatia ana ngā pātengi raraunga nui ake (i te nuinga o te wā e 10–20 ngā wā o ngā whakakotahitanga mō ia reo).

I whakamahia e: AT&T Natural Voices, Nuance Vocalizer, Google Translate TTS o mua.

Tatauranga/Parameter (2000s-2010s)

Ko ngā tauira Markov Hidden (HMMs) me ngā whatunga ā-ira hōhonu i muri ake nei i whakaputaina ngā tohu kōrero (pike, roanga, āhuahira matatini) i puta mai i tētahi vokoder. I whakaaetia e tēnei te kupu kōrero ā-kore me te hanganga reo ngāwari ake, engari i te nuinga o te wā ka whakaputaina e te hipanga vokoder i tētahi \.

Tauira matua: HTS, Merlin, ngā pūnaha taketake o te DNN.

Neural TTS (2016-nāianei)

I tīmata te wā o nāianei ki te WaveNet (DeepMind, 2016), nāna i waihanga i te tauira oro mā te tauira mā te whakamahi i ngā whatunga ā-ira hōhonu. I whaia tēnei e Tacotron (Google, 2017), nāna i ako ki te mahere tika i te kupu ki ngā tauine.

Ko ngā tūpāpaku matua: WaveNet, Tacotron, FastSpeech, VITS, Bark, Kokoro.

He pēhea te mahi o te TTS neural o nāianei

Ko te hanganga i muri i ngā oro AI māori.

Tatauranga kupu me te pūnoatanga

Kua whakawāteatia te kupu rākau, ā, kua whakaritea: ka noho ngā tau hei kupu (\

Tauira Acoustic (Wāhi ki te Spectrogram)

Ko te tauira pūoro (i te nuinga o te wā he whakarerekētanga, he whatunga whakarerekē rānei) e mau ana i te raupapa pūoro me te matapae i tētahi pūoro pūoro - he whakaaturanga whakaahua o te āhua o te oro.

Vocoder (Spectrogram ki te oroihi)

Ko te vocoder e tahuri ana i te pūwhitiwhiti mel ki roto i ngā momo aho tūturu. Ko ngā vocoders o mua pēnei i te Griffin-Lim e whakaputa ana i ngā ahanoa robotic. Ko ngā vocoders neural o nāianei (HiFi-GAN, BigVGAN, Vocos) e whakanao ana i te 24kHz, i te 44.1kHz rānei o te pūoro e tango ana i ngā taipitopito o te reo māori, tae atu ki ngā oro o te hau me ngā nekeneketanga matatini.

Tauira-ki-tauira

Ko ngā tauira o nāianei pēnei i a VITS, Kokoro, me Bark e whakarerekē ana i te wāhanga-rua. Ka haere hāngai rātau mai i te kupu ki te oro i roto i tētahi whatunga ā-ira kotahi, e whakaputa ana i ngā hua māori ake me ngā taonga iti iho. Ka taea hoki e ētahi tauira (pēnei i a Bark) te whakaputa i ngā oro kāore i te kōrero, i te māharahara, i te pūoro i te taha o te kōrero.

Kua whakatauritetia ngā huarahi TTS

He pēhea te whakataurite i ngā whakatupuranga e whā o te hangarau TTS?

Āhuatanga Māoritanga He āhuahira Āhuatanga E hiahiatia ana te raraunga
Ko te whakakotahitanga āhuahira
He tauira auautanga i runga i te ture
1960s-1990s Kāore
Ka whakakotahitia
Ko ngā wāhanga oro i whakarārangitia
1990s-2010s 10-20+ wā
Parameter (HMM/DNN)
Kāhua kōrero tatauranga
2000s-2016 1-5 ngā wā
Neural End-to-End
Ko te akoranga hōhonu (VITS, Kokoro, Bark)
2016-Ināianei Minutes ki ngā wā

Ko ngā taupānga noa iho o TTS

I reira ka whakamahia te kupu ki te kōrero i tēnei rā

Āhei ki te uru

E whakawhirinaki ana ngā kaipānui mata, ngā pūrere āwhina, me ngā utauta mō ngā tāngata me ngā raruraru mata, ngā raruraru akoranga rānei ki te TTS hei whakawātea i ngā ihirangi mamati ki te katoa.

Hanganga ihirangi

Ko ngā YouTubers, ngā podcasters, me ngā kaiwhakanao pāpāho pāpori e whakamahi ana i te TTS mō ngā kōrero, te kōrero, me te whakaputa ihirangi aunoa i runga i te tauine.

Ko ngā Kaiāwhina Āhuatanga

Ko Siri, Alexa, Google Assistant, me ngā tāngata kōrero o te ratonga ngaio, e whakamahi ana i te TTS hei kōrero māori i ngā urupare ki ngā whakamahinga.

E pā ana ngā pātai

E pā ana ngā pātai noa iho ki te hangarau kuputuhi ki te kōrero

E tohu ana te TTS ki te kupu kupu-ki-whakaahua. E tohu ana ki te hangarau e tahuri ana i te kupu tuhituhi ki ngā kupu kōrero e kōrerotia ana mā te whakamahi i ngā reo i hangaia, i hangaia rānei e AI. Ka whakamahia te kupu ki te "whakahaere i te reo" i roto i ngā pukapuka hangarau.

Ko ngā pūnaha TTS o nāianei e mahi ana i roto i ngā wāhanga e toru: te tātari kupu (whakahaeretanga, te pūnoa, te tahuritanga reo), te matapae pūnoa (te whakatau i te auau, te āhua, te āwhina, me ngā tauwhāiti), me te whakaminenga oro (te whakaputa i te āhua ngaru o te oro tūturu).

Ko te TTS whakakotahi e whakakotahi ana i ngā wāhanga kōrero i whakaritea i mua, e āhei ana ki te whakarerekē i ngā whakawhitinga. Ko te TTS ā-hinengaro e hanga ana i te kōrero mai i te tīmatanga mā te whakamahi i te akoranga hōhonu, e whakanao ana i te oro, i te oro māori ake me te āhuatanga pai ake.

Ko te SSML (Speech Synthesis Markup Language) he reo tohu ā-XML e whakaae ana kia whakahaeretia e koe te āhua o ngā pūnaha TTS e kōrero ana i te kupu. Ka taea e koe te whakapūtā i ngā tauwhāiti, ngā whakahua, te kōrero, ngā huringa āhua, me te mokatere kōrero mā te whakamahi i ngā tohu SSML i roto i tōmu tāurunga kupu.

E whakamahia ana te TTS mō te āheitanga (ngā kaipānui mata mō ngā kaimahi whakawhanake), ngā kaiāwhina ā-waha (Siri, Alexa, Google Assistant), te whakaputa pukapuka oro, te akoranga-ira, te whakatere GPS, ngā pūnaha ratonga whaiaro IVR, te hanganga ihirangi, me ngā taupānga akoranga reo.

I hurihia te TTS mai i ngā pūnaha ā-ture ā-rohe i ngā tau kotahi mano e 1960, ki te whakakotahitanga pāhono i ngā tau kotahi mano e 1990, ki te whakakotahitanga taupānga tatauranga i ngā tau 2000, ki te TTS ā-ira me te WaveNet i te tau 2016, ki ngā tauira whakarerekē me te horahanga o nāianei e whiwhi ana i te kounga o te taumata tangata.

E hiahiatia ana e te TTS pūoro māori he pūāhua tika (whakahaere, whakahauhau, whakahauhau), te whakatere tika, ngā whakawhitinga māmā i waenganui i ngā pūoro, me te tuakiri reo ōrite.Ko ngā tauira ā-ira e ako ana i ēnei tauira mai i ngā huinga raraunga nui o ngā pūkete reo tangata māori.

Ka taea e ngā tauira tārua reo pēnei i te Chatterbox me te CosyVoice 2 te tārua i tētahi reo tauwhāiti mai i te 5–30 sekone o te oro. Ka tangohia e te reo tārua te timbre, te āhua, me te kāhua kōrero, ahakoa ka whakamahia ngā whakaaro ā-ture me ngā whakaaro ā-ture ki te tārua i ētahi atu reo.

Ko ētahi tauira e hāngai ana ki ngā reo tauwhāiti, ā, ko ētahi atu he reo maha. Ko te reo Ingarihi te tauira me ngā reo tino wātea, engari ko ngā reo Hainamana, Hapanihi, Korea, Pāniora, me ngā reo o Ūropi e tautokona ana.

Ko te TTS he huinga iti o te whakanao reo AI. Ko te TTS e tahuri ana i ngā tāuru kupu ki te huaputa reo. Ko te whakanao reo AI he kupu whānui ake e whakauru ana i te tārua reo, te tahuri reo, te kōrero ki te kōrero, me te whakanao pānga oro.

E whakawhirinaki ana ki ōna hiahia. Ko Kokoro e whakarato ana i te ōritetanga pai rawa o te tere me te āhuatanga mō te whakamahinga ahuwhānui. Ko te Chatterbox e whakahaere ana i te tārua reo. Ko te Orpheus e tino pai ana i te kīanga ā-āhuatanga. StyleTTS 2 e whakaputa ana i te kōrero māori rawa o te kaikōrero kotahi. Kāore he tauira "māuiui" kotahi mō ngā take whakamahi katoa.

He. Ko ngā tauira katoa i runga i te TTS.ai he pūtake tūwhera, ā, ka taea te whakahaere i a rātou anō. Ko ngā tauira CPU-only pēnei i te Piper e haere ana i runga i tētahi rorohiko. Ko ngā tauira GPU pēnei i te Kokoro me te Bark e hiahiatia ana he NVIDIA GPU me te 2-8GB VRAM. Mā tātau pūnaha e whakarato ana i te āheitanga hopu kia kore ai e hiahiatia kia whakahaeretia e koe te hanganga.
5.0/5 (1)

E whakamātauria ana e koe te TTS hōu

Ka whakamātautau 24+ ngā tauira reo AI o te āhua o te āhua mo te wātea. Tirohia te tawhiti o te kupu ki te kōrero.