Testun i LeferyddComment

Trosi testun i siarad sy'n swnio'n naturiol gyda mwy na 24 model AI ffynhonnell agored. Am ddim i'w ddefnyddio, nid oes angen cyfrif.

Cofrestru

Ni Allforio i' r ffolder hon

0/500 Nodau

Cofrestru am gyfyngiad 5,000 nod

Modd SSML (Iaith Marcio Cyfansoddiad Lleferydd ar gyfer rheoli manwlName)

Amlapio' ch testun mewn tagiau SSML er mwyn cael rheoli cywir:

<speak><prosody rate="slow">Slow speech</prosody></speak>

& # 160; Teitl:

Ychwanegu marciau teimlad i ddylanwadu ar y dosbarthiad (mae cynhaliaeth model yn amrywio):

Geiriadur Ynganiad

Diffinio ynganiad addasiedig (gair = ynganiad):

Pwynt 0

-12 +12

Fformat Ymgom Dia: Defnyddiwch y tagiau [S1] a [S2] i nodi siaradwyr gwahanol. Enghraifft:

[S1] Helo! [S2] Helo, sut ydych chi?



                
                
                    
                    
                        Model AI
                        
                    

                    
                    
                        Llais
                        
                    
                
                

                
                
                    
                    
                        Iaith
                        
                    

                    
                    
                        Fformat Allbwn
                        
                    

                    
                    
                        
                            Cyflymder
                            1.0x
                        
                        
                        
                            0.5x
                            2.0x
                        
                    
                

                
                
                    
                    
                        
                        Am ddim gyda Piper, VITS, MeloTTS



        
        
            
                Bydd eich sain a gynhyrchwyd yn ymddangos yma. Dewiswch ddull, rhowch destun, a chliciwch Creu.
            
            
            
                
                
                    Methodd y Creu
                    
                
            
        

            
                
                    
                        Creuwyd Sain yn Llwyddiannus
                        
                    
                    
                        


    
        
            
            
                
                    
                
                
            
        
    


                        
                            
                                Lawrlwytho Sain
                            
                            
                            
                            Mae'r cyswllt yn darfod mewn 24 awr
                            
                                
                                    
                                    
                                    
                                    
                                    
                                
                            
                        
                    
                
            
        

        

    
        
            
                
                    Hoffwch TTS.ai? Meddwl am eich ffrindiau!



    
    
        
        
            
                Manylion Model
            
            
                
                
                    
                    MegaTTS3
                
                Premium
                MegaTTS3 from ByteDance uses a novel sparse alignment mechanism combined with a latent diffusion transformer. Features adjustable trade-off between speech intelligibility and speaker similarity for zero-shot voice cloning.
                
                    
                        
                            Datblygwr:
                            ByteDance
                        
                        
                            Trwydded:
                            Apache 2.0
                        
                        
                            Cyflymder
                            
                                Slow
                            
                        
                        
                            Ansawdd:
                            
                                
                            
                        
                        
                            ieithoedd
                            2 ieithoedd
                        
                        
                            VRAM
                            8GB
                        
                        
                            Clonio Llywio
                             Cynhelir
                        
                    
                
                
                
                    Nodweddion:
                    
                        
                        Voice cloning
                        
                        Adjustable similarity
                        
                        Cross-lingual
                        
                    
                
                
                
                Gorau ar gyfer:: 
                High-fidelity voice cloning
                
                
            
        

        
        
            
                Awgrymiadau ar gyfer Canlyniadau Mwy Da
            
            
                
                    Defnyddio atalnodi cywir ar gyfer seibiau naturiol a chysgodion
                    Sillafu rhifau a byrddau ar gyfer ynganiad mwy clir
                    Ychwanegu comiau i greu seibiau byr rhwng ymadroddion
                    Defnyddio ellipses (...) am seibiau dramatig hirach
                    Ceisiwch Kokoro neu CosyVoice 2 am y canlyniadau mwyaf naturiol
                    Defnyddio Dia ar gyfer deialog aml-seinydd a chynnwys newyddion
                
            
        

        
        
            
                Costiau Credyd
            
            
                
                    
                        
                            o Fawrth
                            Cost y nod 1K
                        
                    
                    
                        
                            Rhydd
                            0 credyd (dim terfyn)
                        
                        
                            Arferol
                            2 credyd / 1K nod
                        
                        
                            Cyntaf
                            4 credyd / 1K nod
                        
                    
                
            
            
                Cael Mwy o Gredydau

o Fawrth	Cost y nod 1K
Rhydd	0 credyd (dim terfyn)
Arferol	2 credyd / 1K nod
Cyntaf	4 credyd / 1K nod






    
        
            
                
                
                    
                    
    Dim hysbysebion
    Defnydd diderfyn
    Cynhaliaeth blaenoriaeth
    Cyrchiad cynnar i nodweddion newydd


                
                

                
                    
                        Cael Mwy o Gredydau






    
        Sut mae Testun i Leferydd AI yn GweithioName
        Creu disgrifiadau llais o ansawdd proffesiynol mewn tri cham syml. Does dim angen gwybodaeth dechnegol.
        
            
                
                    
                        
                            
                        
                        Cam 1
                        Rhowch eich testun
                        Teipiwch, gludwch neu lanlwythwch y testun yr hoffech ei drosi i lais. Cynhelir hyd at 5,000 o nodau y genhedlaeth ar gyfer defnyddwyr wedi mewngofnodi. Defnyddiwch destun plaen neu ychwanegwch dagiau SSML am reolaeth uwch dros ynganiad, seibio, a phwyslais.
                    
                
            
            
                
                    
                        
                            
                        
                        Cam 2
                        Dewis Model a Lleferydd
                        Dewiswch o 20+ modelau AI dros dri lefel. Dewiswch lais sy'n cydweddu â'ch cynnwys, dewiswch eich iaith targed, addaswch gyflymder chwarae o 0.5x i 2.0x, a dewiswch eich fformat allbwn hoff (MP3, WAV, OGG, neu FLAC).
                    
                
            
            
                
                    
                        
                            
                        
                        Cam 3
                        Lawrlwytho
                        Cliciwch Creu a bydd eich sain yn barod mewn eiliad. Rhagolwgwch gyda'r chwaraewr mewnol, lawrlwythwch yn y fformat a ddewisoch, neu copïwch gyswllt rhannadwy. Defnyddiwch yr API ar gyfer prosesu batch a chyfuno â'ch llif gwaith.
                    
                
            
        
    






    
        Testun i LeferyddComment
        Mae testun-i-farn wedi'i bweru gan AI yn trawsnewid sut mae pobl yn creu, yn defnyddio, ac yn rhyngweithio â chynnwys sain ar draws dosbarthiadau o ddiwydiannau.
        
            
                
                    
                        
                        Llyfrau Sain
                        Trosi llyfrau cyfan i lyfrau sain sy'n swnio'n naturiol gydag ysgrifennu ansawdd stiwdio. Cynhaliaeth aml-seinydd gyda Dia ar gyfer deialog nodau.
                    
                
            
            
                
                    
                        
                        Disgrifiad:
                        Creu cyfieithiadau llais proffesiynol ar gyfer YouTube, TikTok, Instagram Reels, a Shorts. 100+ o lais neu clôn eich un eich hun.
                    
                
            
            
                
                    
                        
                        Podlediadau
                        Creu cyfresi newyddion o sgriptiau gyda nifer o lais AI. Defnyddiwch Dia ar gyfer sgyrsiau naturiol dau siaradwr.
                    
                
            
            
                
                    
                        
                        GemauComment
                        Cyfansoddi llais AI ar gyfer gemau annibynnol, nofelau gweledol, a ffantasi rhyngweithiol. Ymgom NPC, llais cutscene, 30+ iaith.
                    
                
            
            
                
                    
                        
                        E-ddysgu
                        Trosi deunyddiau cwrs, darlithoedd, a chynnwys hyfforddi i sain. Cynhaliaeth aml-iaith ar gyfer platfformau byd-eang.
                    
                
            
            
                
                    
                        
                        Hygyrchedd
                        Gwneud gwefannau, dogfennau, a rhaglenni yn hygyrch. Cyfuniad API darllenydd sgrin a throsi erthyglau i sain.
                    
                
            
            
                
                    
                        
                        Systemau IVR a FfônName
                        Power IVR systems, phone menus, and customer service with natural AI voices. Low-latency streaming for call centers.
                    
                
            
            
                
                    
                        
                        Cyfryngau cymdeithasol
                        Disgrifiadau TikTok, Instagram Reels, sylwadau Twitter / X, YouTube Shorts. Creu cyflym gyda modelau am ddim.
                    
                
            
            
                
                    
                        
                        Llif
                        Rhybuddion Twitch TTS, sgwrsio-i-lais, cyd-westeiwyr AI, a bots Discord. Goramser isel, 100+ o lais, cydnaws â StreamElements.
                    
                
            
            
                
                    
                        
                        Marchnata
                        Ad voiceovers, explaner videos, product demos, and sales presentations. Scale audio content production across campaigns.
                    
                
            
            
                
                    
                        
                        Dyblygu a Lleoleiddio
                        Cyfieithu a dyblygu fideo i 30+ o ieithoedd gyda AI sy'n cydweddu â llais. Trawssgrifiad awtomatig a darganfod siaradwr.
                    
                
            
            
                
                    
                        
                        Meditation & Wellness
                        Meditations guided, sleep stories, breathing exercises, and affirmations with calm, soothing AI voices.
                    
                
            
        
        
            Dangos pob achos defnydd ac offer
        
    






    
        Modelau Testun- i- Leferydd
        Manylebau manwl ar gyfer pob model AI sydd ar gael ar TTS.ai. Cymharu ansawdd, cyflymder, cefnogaeth iaith, a nodweddion i ddod o hyd i'r model perffaith ar gyfer eich prosiect.

        
        
            Pob un (32)
            Rhydd (7)
            Arferol (18)
            Cyntaf (7)
        

        
            
            
                
                    
                    
                        
                            
                                Kokoro
                                Free
                            
                            
                                Model testun-i-leferydd 82 miliwn o baramedrau yw Kokoro sy'n gwneud yn well na'i ddosbarth pwysau. Er gwaethaf ei faint bach, mae'n cynhyrchu siarad naturiol a chryno. Cynhelir nifer o ieithoedd gan gynnwys Saesneg, Japaneg, Tsieineeg a Corea gydag amrywiaeth o lais cryf. Mae'n rhedeg yn gyflym iawn - yn creu sain bron 100 gwaith yn gyflymach na real-time ar GPU.

                                
                                    
                                        Datblygwr::

                                        Hexgrad
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, ja, zh, ko, fr, de, it, pt, es, hi, ru
                                    
                                    
                                        VRAM:

                                        1.5GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        Rhydd
                                    
                                

                                
                                
                                    
                                        
                                        Paramedrau 82M
                                        
                                        Ultra- cyflym
                                        
                                        Llygaid Mynegiant
                                        
                                        Aml- iaith
                                        
                                        Cynhaliaeth Llif
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                TTS o ansawdd uchel gyda chynnydd lleiaf, rhaglenni llifogydd
                                
                            
                            
                                
                                    Ceisio Kokoro
                                
                            
                        
                    
                    
                    
                        
                            
                                Piper
                                Free
                            
                            
                                Peiriant testun-i-leferydd ysgafn yw Piper a ddatblygwyd gan Rhasspy sy'n defnyddio strwythurau VITS a laryncs. Mae'n rhedeg yn llwyr ar CPU, gan ei wneud yn berffaith ar gyfer dyfeisiau ymylon, awtomeiddio cartref, a rhaglenni sy'n gofyn am TTS all-lein. Gyda dros 100 o lais dros 30+ o ieithoedd, mae Piper yn darparu lleferydd sy'n swnio'n naturiol ar gyflymderau gwirioneddol hyd yn oed ar Raspberry Pi 4.

                                
                                    
                                        Datblygwr::

                                        Rhasspy
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
                                    
                                    
                                        VRAM:

                                        0 (CPU only)
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        Rhydd
                                    
                                

                                
                                
                                    
                                        
                                        CPU-Friendly
                                        
                                        Galluogi all-lein
                                        
                                        100+ llais
                                        
                                        30+ iaith
                                        
                                        Cynhaliaeth SSML
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Rhagolygon cyflym, hygyrchedd, a rhaglenni mewnadeiladedig
                                
                            
                            
                                
                                    Ceisio Piper
                                
                            
                        
                    
                    
                    
                        
                            
                                VITS
                                Free
                            
                            
                                VITS (Dealltwriaeth Amrywiol gyda dysgu gwrthwynebol ar gyfer Testun-i-Leferydd diwedd-i-ddiwedd) yw dull TTS diwedd-i-ddiwedd paralel sy'n creu sain sy'n swnio'n fwy naturiol na'r modelau cyfredol o ddau gam. Mae'n mabwysiadu dealltwriaeth amrywiol wedi'i wella gyda llifoedd normaleiddio a phrosesu hyfforddi gwrthwynebol, gan gyflawni gwelliannau sylweddol mewn naturioldeb.

                                
                                    
                                        Datblygwr::

                                        Jaehyeon Kim et al.
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, ja, ko
                                    
                                    
                                        VRAM:

                                        1GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        Rhydd
                                    
                                

                                
                                
                                    
                                        
                                        Cyfansoddiad diwedd- i- ddiwedd
                                        
                                        Prosod naturiol
                                        
                                        Dehongliad cyflym
                                        
                                        Aml-seinydd
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Testun-i-leferydd pwrpas-cyffredinol gyda phrosodi naturiolName
                                
                            
                            
                                
                                    Ceisio VITS
                                
                            
                        
                    
                    
                    
                        
                            
                                MeloTTS
                                Free
                            
                            
                                Llyfrgell TTS aml-iaith yw MeloTTS gan MyShell.ai sy'n cynnal Saesneg (Americanaidd, Prydeinig, Indiaidd, Awstralaidd), Sbaeneg, Ffrangeg, Tsieineaidd, Japaneaidd a Corea. Mae'n hynod o gyflym, yn prosesu testun ar gyflymder sy'n debyg i gyflymder amser real ar y CPU yn unig. Mae MeloTTS wedi ei ddylunio ar gyfer defnydd cynhyrchu ac yn cynnal dehongliad CPU a GPU.

                                
                                    
                                        Datblygwr::

                                        MyShell.ai
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, es, fr, zh, ja, ko
                                    
                                    
                                        VRAM:

                                        0.5GB (GPU optional)
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        Rhydd
                                    
                                

                                
                                
                                    
                                        
                                        Wedi' i wella gan y CPU
                                        
                                        Aml- iaith
                                        
                                        Amryw Gyfieithu
                                        
                                        Para-Gynhyrchu
                                        
                                        Goramser isel
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Rhaglenni cynhyrchu sydd angen TTS cyflym, aml-ieithog
                                
                            
                            
                                
                                    Ceisio MeloTTS
                                
                            
                        
                    
                    
                    
                        
                            
                                Bark
                                Standard
                            
                            
                                Model testun-i-sain wedi'i seilio ar drawsnewidydd yw Bark gan Suno, sy'n gallu creu siarad amhrisiadwy, aml-ieithog, yn ogystal â sain fel cerddoriaeth, sŵn cefndir, ac effeithiau sain eraill. Gall gynhyrchu cyfathrebu di-eiriau fel chwerthin, syrthio, a crynu. Cynhelir mwy na 100 o ragosodiadau siaradwr a mwy na 13 iaith gan Bark.

                                
                                    
                                        Datblygwr::

                                        Suno
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Slow
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
                                    
                                    
                                        VRAM:

                                        5GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Effeithiau Sain
                                        
                                        Chwerthin/syfrdanu
                                        
                                        Creu cerddoriaeth
                                        
                                        100+ siaradwr
                                        
                                        Aml- iaith
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Cynnwys sain creadigol, llyfrau sain gydag emosiynau, effeithiau sain
                                
                            
                            
                                
                                    Ceisio Bark
                                
                            
                        
                    
                    
                    
                        
                            
                                Bark Small
                                Standard
                            
                            
                                Fersiwn distyll o'r model Bark yw Bark Small sy'n trosglwyddo rhywfaint o ansawdd sain am gyflymderau dehongli llawer cyflymach a gofynion cof llai. Mae'n cadw gallu Bark i greu siarad gydag emosiynau, chwerthin, ac ieithoedd lluosol.

                                
                                    
                                        Datblygwr::

                                        Suno
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
                                    
                                    
                                        VRAM:

                                        2GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Ysgafn
                                        
                                        Yn gyflymach na'r croen llawn
                                        
                                        Mynegiant emosiynol
                                        
                                        Aml- iaith
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Sain greadigol gyflym pan fo'r barc llawn yn rhy araf
                                
                            
                            
                                
                                    Ceisio Bark Small
                                
                            
                        
                    
                    
                    
                        
                            
                                CosyVoice 2
                                Standard
                            
                            
                                Mae CosyVoice 2 gan Labordy Tongyi Alibaba yn cyflawni ansawdd lleferydd tebyg i'r dynol gydag arafwch isel iawn, gan ei wneud yn berffaith ar gyfer cymhwysiadau amser real. Mae'n defnyddio dull canfod sgwâr terfynedig ar gyfer cyfansoddiad llifogydd a chynhelir clonio llais zero-shot, cyfansoddiad rhwng ieithoedd, a rheoli teimladau graenus. Mae'n rhagori ar lawer o systemau TTS masnachol mewn gwerthusiad personol.

                                
                                    
                                        Datblygwr::

                                        Alibaba (Tongyi Lab)
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, ja, ko, fr, de, it, es
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Llif
                                        
                                        Clonio zero-shot
                                        
                                        Traws- ieithyddol
                                        
                                        Rheoli emosiynau
                                        
                                        Parhad dynol
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Rhaglenni amser real, TTS llif, cynorthwywyr llais
                                
                            
                            
                                
                                    Ceisio CosyVoice 2
                                
                            
                        
                    
                    
                    
                        
                            
                                Dia TTS
                                Standard
                            
                            
                                Dia gan Nari Labs yw model testun-i-leferydd 1.6B paramedr wedi'i ddylunio'n benodol ar gyfer creu cyfathrebu aml-seinydd. Gall gynhyrchu sgyrsiau sy'n swnio'n naturiol rhwng dau siaradwr gyda chylchdroi addas, prosody, a mynegiant emosiynol. Mae Dia yn berffaith ar gyfer creu cynnwys arddull pod, cyfathrebu llyfr sain, a AI cyfathrebu rhyngweithiol.

                                
                                    
                                        Datblygwr::

                                        Nari Labs
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Aml-seinydd
                                        
                                        Creu Ymgom
                                        
                                        Derbyniad tro naturiol
                                        
                                        Mynegiad emosiynol
                                        
                                        Paramedrau 1.6B
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Podcasts, audiobook dialogues, conversational content
                                
                            
                            
                                
                                    Ceisio Dia TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                Parler TTS
                                Standard
                            
                            
                                Parler TTS yw model testun-i-lafar sy'n defnyddio disgrifiadau llais iaith naturiol i reoli'r siarad a gynhyrchir. Yn hytrach na dewis o bleidleisiau rhagosodedig, chi sy'n disgrifio'r llais rydych chi ei eisiau (e.e., "llais menyw oer gydag arwyddair Prydeinig bach, yn siarad yn araf ac yn glir") a Parler sy'n creu siarad sy'n cydweddu â'r disgrifiad hwn. Mae hyn yn ei wneud yn arbennig o hyblyg ar gyfer cymwysiadau creadigol.

                                
                                    
                                        Datblygwr::

                                        Hugging Face
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Disgrifiad llais
                                        
                                        Rheoli iaith naturiol
                                        
                                        Creu llais addasadwy
                                        
                                        Dim angen llais rhagosodedig
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Rhaglenni creadigol lle mae angen nodweddion llais addasiedig arnoch
                                
                            
                            
                                
                                    Ceisio Parler TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                GLM-TTS
                                Standard
                            
                            
                                GLM-TTS gan Zhipu AI yw system testun-i-leferydd wedi'i hadeiladu ar arddull Llama gyda chydweddiad llif. Mae'n cyrraedd y gyfradd gwall nodau isaf ymhlith modelau TTS ffynhonnell agored, sy'n golygu ei fod yn cynhyrchu'r ynganiad mwyaf cywir. GLM-TTS yn cynnal Saesneg a Tsieinëeg gyda chlonio llais o samplau sain 3-10 eiliad.

                                
                                    
                                        Datblygwr::

                                        Zhipu AI
                                    
                                    
                                        Trwydded::

                                        GLM-4 License
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Cyfradd gwall isaf
                                        
                                        Clonio llais
                                        
                                        Cydweddiad Llif
                                        
                                        Prosod naturiol
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Cymhwysiadau sy'n gofyn am gywirdeb ynganiad uchaf
                                
                            
                            
                                
                                    Ceisio GLM-TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                IndexTTS-2
                                Standard
                            
                            
                                IndexTTS-2 yw system testun-i-leferydd uwch sy'n rhagori mewn cyfansoddiad llais zero-shot gyda rheoli teimladau grawn-fin. Gall greu siarad gyda tonau teimladau penodol fel hapus, trist, ofnadwy, neu ofnus heb angen data hyfforddi teimladau penodol. Defnyddia'r model fectorau teimladau i reoli'n uniongyrchol mynegiant teimladau'r siarad a gynhyrchir.

                                
                                    
                                        Datblygwr::

                                        Index Team
                                    
                                    
                                        Trwydded::

                                        Bilibili Model License
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Rheoli emosiynau
                                        
                                        Zero-shot
                                        
                                        Fectorau Emosiynau
                                        
                                        Llythrennedd Mynegiant
                                        
                                        Rheoli grawn- fin
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Cynnwys mynegiant emosiynol, llyfrau sain, cynorthwywyr rhithwir
                                
                            
                            
                                
                                    Ceisio IndexTTS-2
                                
                            
                        
                    
                    
                    
                        
                            
                                Spark TTS
                                Standard
                            
                            
                                Spark TTS gan SparkAudio yw model testun-i-leferydd sy'n cyfuno clonio llais gydag arddull siarad a teimlad rheoliadwy. Gan ddefnyddio dim ond 5 eiliad o sain cyfeirio, gall clonio llais ac yna greu siarad gyda teimladau, cyflymderau, ac arddull gwahanol tra'n cadw'r dynodiad llais cloniedig. Defnyddia Spark TTS system reoli seiliedig ar alw.

                                
                                    
                                        Datblygwr::

                                        SparkAudio
                                    
                                    
                                        Trwydded::

                                        CC BY-NC-SA 4.0
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Clonio llais
                                        
                                        Rheoli emosiynau
                                        
                                        Rheolydd Arddull
                                        
                                        Seiliedig ar Ymholiad
                                        
                                        Cloni 5 eiliad
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Creu cynnwys gyda llais cloniedig a rheoli emosiynol
                                
                            
                            
                                
                                    Ceisio Spark TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                GPT-SoVITS
                                Standard
                            
                            
                                Mae GPT-SoVITS yn cyfuno modelu iaith arddull GPT gyda SoVITS (Singing Voice Inference via Translation and Synthesis) ar gyfer clonio llais pwerus mewn ychydig o saethu. Gyda dim ond 5 eiliad o sain cyfeirio, mae'n gallu clonio llais yn gywir a chreu llais newydd tra'n cadw nodweddion unigryw'r siaradwr. Mae'n rhagori ar gyfansoddi llais siarad a chanu.

                                
                                    
                                        Datblygwr::

                                        RVC-Boss
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Slow
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, ja, ko
                                    
                                    
                                        VRAM:

                                        6GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Cloni 5 eiliad
                                        
                                        Llais yn canu
                                        
                                        Dysgu ychydig o luniau
                                        
                                        Dilysrwydd Uchel
                                        
                                        Traws- ieithyddol
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Clonio llais, cyfansoddi caneuon, adlewyrchu llais creuwr cynnwys
                                
                            
                            
                                
                                    Ceisio GPT-SoVITS
                                
                            
                        
                    
                    
                    
                        
                            
                                Orpheus
                                Standard
                            
                            
                                Model testun-i-leferydd ar raddfa fawr yw Orpheus sy'n cyflawni mynegiant emosiynol ar lefel dynol. Wedi'i hyfforddi ar fwy na 100,000 o oriau o ddata lleferydd amrywiol, mae'n rhagori wrth greu lleferydd gydag emosiynau naturiol, pwyslais, ac arddull siarad. Gall Orpheus gynhyrchu lleferydd sy'n amhosib ei wahanu o recordiadau dynol.

                                
                                    
                                        Datblygwr::

                                        Canopy Labs
                                    
                                    
                                        Trwydded::

                                        Llama 3.2 Community
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Emosiwn lefel dynol
                                        
                                        100K awr o hyfforddiant
                                        
                                        Canolbwynt naturiol
                                        
                                        Llythrennedd Mynegiant
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Lleferydd emosiynol o ansawdd uchel, llyfrau sain, chwarae llais
                                
                            
                            
                                
                                    Ceisio Orpheus
                                
                            
                        
                    
                    
                    
                        
                            
                                Chatterbox
                                Premium
                            
                            
                                Mae Chatterbox gan Resemble AI yn fodel clonio llais zero-shot o'r radd flaenaf. Mae'n gallu ail-greu unrhyw lais o sampl sain sengl â chywirdeb anhygoel, gan ddal nid yn unig y timbre ond hefyd yr arddull siarad a'r lliwiau emosiynol. Mae gan Chatterbox hefyd reolaeth emosiwn grawn-fin, sy'n caniatáu i chi addasu ton emosiynol yr iaith a gynhyrchir yn annibynnol o'r dynodiad llais.

                                
                                    
                                        Datblygwr::

                                        Resemble AI
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        4x
                                    
                                

                                
                                
                                    
                                        
                                        Clonio zero-shot
                                        
                                        Rheoli emosiynau
                                        
                                        Dilysrwydd Uchel
                                        
                                        Trosglwyddo Arddull
                                        
                                        Clonio sampl sengl
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Clonio llais proffesiynol gyda rheoli emosiynol, creu cynnwys
                                
                            
                            
                                
                                    Ceisio Chatterbox
                                
                            
                        
                    
                    
                    
                        
                            
                                Tortoise TTS
                                Premium
                            
                            
                                Tortoise TTS yw'r system testun-i-leferydd aml-lais awto-adferol sy'n rhoi blaenoriaeth i ansawdd sain dros gyflymder. Mae'n defnyddio pensaernïaeth wedi'i harwain gan DALL-E i gynhyrchu lleferydd mor naturiol â phosiod rhagorol a thebygrwydd siaradwr. Er ei fod yn arafach na llawer o ddewisiadau eraill, mae Tortoise yn cynhyrchu rhai o'r lleferydd cyfansawdd mwyaf realistig sydd ar gael yn yr ecosystem ffynhonnell agored.

                                
                                    
                                        Datblygwr::

                                        James Betker
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Slow
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        8GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        4x
                                    
                                

                                
                                
                                    
                                        
                                        Ansawdd Gorau
                                        
                                        Aml- lais
                                        
                                        Adeiladwaith DALL-E
                                        
                                        Clonio llais
                                        
                                        Ymysgogol
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Llyfrau sain, cynnwys premiwm, rhaglenni o ansawdd uchel
                                
                            
                            
                                
                                    Ceisio Tortoise TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                StyleTTS 2
                                Premium
                            
                            
                                Mae StyleTTS 2 yn cyflawni cyfansoddiad TTS ar lefel dynol drwy gyfuno lledaeniad arddull â hyfforddiant gwrthwynebiad gan ddefnyddio modelau iaith siarad mawr. Mae' n creu' r siarad sy' n swnio' n naturiol fwyaf o blith modelau siaradwr sengl, yn cystadlu â recordiadau dynol. Mae StyleTTS 2 yn defnyddio modelu arddull seiliedig ar ledaeniad i adnabod yr ystod lawn o newidiadau mewn siarad dynol.

                                
                                    
                                        Datblygwr::

                                        Columbia University
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        4x
                                    
                                

                                
                                
                                    
                                        
                                        Lefel dynol
                                        
                                        Arddull trosglwyddo
                                        
                                        Hyfforddiant gwrthwynebiad
                                        
                                        Amrywiad naturiol
                                        
                                        Dilysrwydd Uchel
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Cyfansoddiad siaradwr sengl o ansawdd stiwdio, naratifau proffesiynol
                                
                            
                            
                                
                                    Ceisio StyleTTS 2
                                
                            
                        
                    
                    
                    
                        
                            
                                OpenVoice
                                Premium
                            
                            
                                Mae OpenVoice gan MyShell.ai yn galluogi clonio llais ar unwaith gydag arweiniad manwl ar arddull llais, teimlad, cystrawen, rhythm, seibiau, ac arwyddair. Mae'n gallu clonio llais o clip sain byr a chreu siarad mewn nifer o ieithoedd tra'n cadw'r dynodiad siaradwr. Mae OpenVoice hefyd yn gweithio fel trawsnewidydd llais, gan ganiatáu trawsnewid llais mewn amser real.

                                
                                    
                                        Datblygwr::

                                        MyShell.ai / MIT
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, ja, ko, fr, de, es, it
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        4x
                                    
                                

                                
                                
                                    
                                        
                                        Clonio ar unwaith
                                        
                                        Trawsnewid Llywio
                                        
                                        Rheoli emosiynau
                                        
                                        Rheoli acen
                                        
                                        Aml- iaith
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Clonio llais gyda rheolydd arddull grawn-fin, trosi llais
                                
                            
                            
                                
                                    Ceisio OpenVoice
                                
                            
                        
                    
                    
                    
                        
                            
                                Qwen3 TTS
                                Standard
                            
                            
                                Qwen3-TTS yw model testun-i-leferydd 1.7 biliwn o baramedrau o dîm Qwen Alibaba. Mae'n cynnal tri modd: lleisiau rhagosodedig gyda rheoli teimladau (9 siaradwr), clôn llais o 3 eiliad o sain yn unig, a modd dylunio llais unigryw lle gallwch ddisgrifio'r llais rydych ei eisiau mewn iaith naturiol. Mae'n cwmpasu 10 iaith gyda mynegiant uchel a phrosodi naturiol.

                                
                                    
                                        Datblygwr::

                                        Alibaba (Qwen)
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, ja, ko, de, fr, ru, pt, es, it
                                    
                                    
                                        VRAM:

                                        7GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Clonio llais
                                        
                                        9 llais rhagosodedig
                                        
                                        Dylunio llais o destun
                                        
                                        Rheoli emosiynau
                                        
                                        10 iaith
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Cynnwys aml-ieithog gyda chlonio llais neu ddylunio llais addasiedig
                                
                            
                            
                                
                                    Ceisio Qwen3 TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                Sesame CSM
                                Premium
                            
                            
                                Sesame CSM (Conversational Speech Model) yw model 1 biliwn o baramedrau wedi' i ddylunio' n benodol ar gyfer creu siarad cyfathrebu. Mae' n modelu' r patrwm naturiol o gyfathrebu dynol gan gynnwys amseru cymryd tro, ymatebion ôl-sianel, ymatebion emosiynol, a llif cyfathrebu. CSM yn creu sain sy' n swnio fel sgwrsio dynol naturiol yn hytrach na chyfathrebu cyfansawdd.

                                
                                    
                                        Datblygwr::

                                        Sesame
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Slow
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        8GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        4x
                                    
                                

                                
                                
                                    
                                        
                                        Cyfathrebu
                                        
                                        Amser naturiol
                                        
                                        Rholio
                                        
                                        Sianel yn Ôl
                                        
                                        Paramedrau 1B
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Cynorthwywyr AI, bots sgwrsio, rhaglenni AI cyfathrebu
                                
                            
                            
                                
                                    Ceisio Sesame CSM
                                
                            
                        
                    
                    
                    
                        
                            
                                Chatterbox Turbo
                                Standard
                            
                            
                                Mae Chatterbox Turbo gan Resemble AI yn uwchraddiad paramedr 350M i Chatterbox, gan ddarparu hyd at 6x o gyflymder amser real gydag oedi o dan 200ms. Mae'n cynnal tagiau para-ieithyddol fel [laugh], [cough], a [chuckle] yn uniongyrchol yn y testun. Mae'n cynnwys marcio dŵr Perth ar bob sain a gynhyrchir ar gyfer olrhain ffynhonnell.

                                
                                    
                                        Datblygwr::

                                        Resemble AI
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        2GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Goramser o dan 200ms
                                        
                                        Tagiau Para- ieithyddol
                                        
                                        6x amser gwirioneddol
                                        
                                        Clonio llais
                                        
                                        Marcio Dŵr
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Gweithredwyr llais amser real, siarad mynegiantol gyda sain naturiol
                                
                            
                            
                                
                                    Ceisio Chatterbox Turbo
                                
                            
                        
                    
                    
                    
                        
                            
                                Zonos
                                Standard
                            
                            
                                Zonos v0. 1 gan Zyphra yw model paramedr 1. 6B sy'n cynnwys rheoli teimladau graenus gyda llithryddion ar gyfer hapusrwydd, ofn, tristwch, ofn, a syndod. Mae'n cynnig Transformer a math newydd o SSM (model lle-state). Hyfforddwyd ar 200K+ o oriau o siarad aml-ieithog gyda chlonio llais zero-shot o 10-30 eiliad o sain cyfeirio.

                                
                                    
                                        Datblygwr::

                                        Zyphra
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, ja, zh, fr, de
                                    
                                    
                                        VRAM:

                                        6GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Rheoli emosiynau
                                        
                                        Clonio llais
                                        
                                        Adeiladwaith SSM
                                        
                                        Aml- iaith
                                        
                                        Rheoli'r uchder/cyfradd
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Llythrennedd mynegiant â rheoli teimladau, stiwdio dylunio llaisName
                                
                            
                            
                                
                                    Ceisio Zonos
                                
                            
                        
                    
                    
                    
                        
                            
                                Dia 2
                                Standard
                            
                            
                                Dia2 gan Nari Labs yw uwchraddiad llif-ymlaen i Dia, ar gael mewn amrywiadau paramedr 1B a 2B. Mae'n dechrau cyfansoddi sain o'r ychydig tocynnau cyntaf, gan ei wneud yn berffaith ar gyfer asiantau llais amser real a phibellau iaith-i-iaith. Mae'n cynnal cyfathrebu aml-seinydd gyda thagiau [S1]/[S2] ac awgrymiadau paraiaith fel (laughs), (coughs).

                                
                                    
                                        Datblygwr::

                                        Nari Labs
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Allbwn Llif
                                        
                                        Aml-seinydd
                                        
                                        Goramser isel
                                        
                                        Cynghorion para- ieithyddol
                                        
                                        Allbwn hyd at 2 munud
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Gweithredwyr llais amser real, creu ymgom, rhaglenni llifogydd
                                
                            
                            
                                
                                    Ceisio Dia 2
                                
                            
                        
                    
                    
                    
                        
                            
                                VoxCPM
                                Standard
                            
                            
                                VoxCPM 1.5 gan OpenBMB yw model TTS newydd rydd-tokenizer sy'n gweithredu mewn gofod parhaus yn hytrach na tocynnau disgybledig. Mae'n cynhyrchu sain 44.1kHz uchel-ddilysrwydd, yn cynnal clonio llais zero-shot o 3-10 eiliad, ac yn cadw cydlyniant drwy baragraffau. Galluoga clonio traws-iaith i chi gymhwyso llais Saesneg i siarad Tsieineaidd a'r gwrthwyneb.

                                
                                    
                                        Datblygwr::

                                        OpenBMB
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Sain 44.1kHz
                                        
                                        Tokenizer-free
                                        
                                        Clonio rhwng ieithoedd
                                        
                                        Ymwybodol o Gyd-destun
                                        
                                        LoRA fine-tuning
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Sain ffyddlondeb uchel, llyfrau sain, cynnwys ffurf hir gyda chysondeb llais
                                
                            
                            
                                
                                    Ceisio VoxCPM
                                
                            
                        
                    
                    
                    
                        
                            
                                OuteTTS
                                Free
                            
                            
                                Mae OuteTTS yn ehangu modelau iaith mawr gyda galluoedd testun-i-ganu tra'n cadw'r adeiladwaith gwreiddiol. Mae'n cynnal amryw o ochr gefn gan gynnwys llama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, a hyd yn oed dehongliad porwr drwy Transformers.js. Mae'n cynnwys clonio llais zero-shot drwy broffiliau siaradwyr wedi'u cadw fel JSON.

                                
                                    
                                        Datblygwr::

                                        OuteAI
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        2GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        Rhydd
                                    
                                

                                
                                
                                    
                                        
                                        CPU
                                        
                                        Ymddygiad porwr
                                        
                                        Clonio llais
                                        
                                        Amryw ochr gefn
                                        
                                        Proffiliau Serydd
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Datblygu Edge, TTS seiliedig ar borwr, amgylcheddau adnoddau isel
                                
                            
                            
                                
                                    Ceisio OuteTTS
                                
                            
                        
                    
                    
                    
                        
                            
                                TADA
                                Standard
                            
                            
                                TADA (Text-Acoustic Dual Alignment) gan Hume AI yw model TTS blaengar sy'n dileu dychryniadau trwy adeiladwaith dychryniad dwbl newydd sbon wedi'i adeiladu ar Llama 3. 2. Ar gael mewn amrywiadau 1B (Saesneg) a 3B (aml-ieithog), TADA yn cyrraedd RTF o 0. 09 — 5x yn gyflymach na modelau TTS sy'n seiliedig ar LLM cymharol. Mae'n cynnal hyd at 700 eiliad o gyd-destun sain a chynhyrchu siarad mynegiant emosiynol gyda dim dychryniadau ar feini prawf safonol.

                                
                                    
                                        Datblygwr::

                                        Hume AI
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        5GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Zero hallucinations
                                        
                                        5x yn gyflymach na LLM TTS
                                        
                                        Mynegiad emosiynol
                                        
                                        Cyfundrefn sain 700s
                                        
                                        Aliniad Dwbl
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Cyfathrebu di- ddealltwriaeth o ansawdd uchel, mynegiant emosiynol, dehongliad cyflym
                                
                            
                            
                                
                                    Ceisio TADA
                                
                            
                        
                    
                    
                    
                        
                            
                                VibeVoice
                                Standard
                            
                            
                                Mae VibeVoice gan Microsoft yn dod mewn dwy fersiwn: model 1.5B ar gyfer cynnwys ffurf hir (hyd at 90 munud, 4 siaradwr) a model Realtime 0.5B ar gyfer llifogydd gyda ~200ms o oedi sain cyntaf. Mae'r fersiwn 1.5B yn rhagorol ar gyfer podiau a llyfr sain gyda chysondeb siaradwr dros rannau hir. Noder: Tynnodd Microsoft y cod TTS o'r storfa ac mae'r sain a gynhyrchwyd yn cynnwys datganiadau diffyg atebolrwydd AI clywadwy.

                                
                                    
                                        Datblygwr::

                                        Microsoft
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Aml-seinydd
                                        
                                        hyd at 90 munud
                                        
                                        Creu Podlediadau
                                        
                                        Cysondeb siaradwr
                                        
                                        Llif 200ms
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Podcasts, audiobooks, long-form multi-speaker content
                                
                            
                            
                                
                                    Ceisio VibeVoice
                                
                            
                        
                    
                    
                    
                        
                            
                                Pocket TTS
                                Free
                            
                            
                                Model testun-i-leferydd cymhleth 100M o paramedrau yw Pocket TTS gan Kyutai (crewyr Moshi) sy'n torri'r ffiniau. Mae'n rhedeg yn effeithlon ar CPU, yn cynnal clonio llais zero-shot o sampl sain sengl, ac yn cynhyrchu llais sy'n swnio'n naturiol. Mae maint bach y model yn ei wneud yn berffaith ar gyfer datblygiadau ar y ffin a chyfleusterau sydd â llai o adnoddau.

                                
                                    
                                        Datblygwr::

                                        Kyutai
                                    
                                    
                                        Trwydded::

                                        MIT
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, fr
                                    
                                    
                                        VRAM:

                                        1GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        Rhydd
                                    
                                

                                
                                
                                    
                                        
                                        Paramedrau 100M
                                        
                                        CPU
                                        
                                        Clonio llais
                                        
                                        Clonio sampl sengl
                                        
                                        Paratoi-ar-Ewyn
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Gosod ysgafn, amgylcheddau CPU yn unig, clonio llais cyflym
                                
                            
                            
                                
                                    Ceisio Pocket TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                Kitten TTS
                                Free
                            
                            
                                Model testun-i-leferydd ultra-ysgafn yw Kitten TTS gan KittenML, wedi'i adeiladu ar ONNX. Gyda gwahanol fathau o 15M i 80M o baramedrau (25-80 MB ar y ddisg), mae'n darparu cyfansoddiad llais o ansawdd uchel ar y CPU heb angen GPU. Mae ganddo 8 llais mewnol, cyflymder llais addasadwy, a rhag-brosesu testun mewnol ar gyfer rhifau, arian, ac unedau. Mae'n berffaith ar gyfer rhaglenni eithaf a chynnydd isel.

                                
                                    
                                        Datblygwr::

                                        KittenML
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en
                                    
                                    
                                        VRAM:

                                        0GB
                                    
                                    
                                        Clonio Llywio:

                                         _Na
                                    
                                    
                                        Cost y nod 1K:

                                        Rhydd
                                    
                                

                                
                                
                                    
                                        
                                        CPU-yn-unig
                                        
                                        Dan 80MB maint model
                                        
                                        8 llais mewnol
                                        
                                        Rheoli cyflymder
                                        
                                        Seiliedig ar ONNX
                                        
                                        Allbwn 24kHz
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                TTS ysgafn a gyflym, datblygiadau ymylon, rhaglenni â chynnydd isel
                                
                            
                            
                                
                                    Ceisio Kitten TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                CosyVoice3
                                Standard
                            
                            
                                CosyVoice3 is the latest evolution from Alibaba's FunAudioLLM team. It features bi-streaming inference with ~150ms latency, instruction-based control for emotion/speed/volume, and improved speaker similarity for zero-shot cloning. Supports 9 languages plus 18 Chinese dialects. RL-tuned variant delivers state-of-the-art prosody.

                                
                                    
                                        Datblygwr::

                                        Alibaba (FunAudioLLM)
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Fast
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, ja, ko, de, es, fr, it, ru
                                    
                                    
                                        VRAM:

                                        4GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        2x
                                    
                                

                                
                                
                                    
                                        
                                        Bi-streaming
                                        
                                        Emotion control
                                        
                                        Voice cloning
                                        
                                        Speed/volume control
                                        
                                        Instruction following
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Multilingual production TTS, real-time applications, voice cloning
                                
                            
                            
                                
                                    Ceisio CosyVoice3
                                
                            
                        
                    
                    
                    
                        
                            
                                MOSS-TTS
                                Premium
                            
                            
                                MOSS-TTS from OpenMOSS supports generation of up to 1 hour of continuous speech across 20 languages. Features token-level duration control, phoneme-level pronunciation control via IPA/Pinyin, and code-switching between languages. The 8B production model delivers state-of-the-art quality with zero-shot voice cloning from reference audio.

                                
                                    
                                        Datblygwr::

                                        OpenMOSS
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Medium
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh, de, es, fr, ja, it, hu, ko, ru, fa, ar, pl, pt, cs, da, sv, el, tr
                                    
                                    
                                        VRAM:

                                        16GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        4x
                                    
                                

                                
                                
                                    
                                        
                                        Ultra-long generation
                                        
                                        20 languages
                                        
                                        Voice cloning
                                        
                                        Duration control
                                        
                                        Pronunciation control
                                        
                                        Code-switching
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                Audiobooks, long-form content, multilingual production
                                
                            
                            
                                
                                    Ceisio MOSS-TTS
                                
                            
                        
                    
                    
                    
                        
                            
                                MegaTTS3
                                Premium
                            
                            
                                MegaTTS3 from ByteDance uses a novel sparse alignment mechanism combined with a latent diffusion transformer. Features adjustable trade-off between speech intelligibility and speaker similarity for zero-shot voice cloning.

                                
                                    
                                        Datblygwr::

                                        ByteDance
                                    
                                    
                                        Trwydded::

                                        Apache 2.0
                                    
                                    
                                        Cyflymder:

                                        Slow
                                    
                                    
                                        Ansawdd::

                                        
                                    
                                    
                                        ieithoedd:

                                        en, zh
                                    
                                    
                                        VRAM:

                                        8GB
                                    
                                    
                                        Clonio Llywio:

                                         _Yw
                                    
                                    
                                        Cost y nod 1K:

                                        4x
                                    
                                

                                
                                
                                    
                                        
                                        Voice cloning
                                        
                                        Adjustable similarity
                                        
                                        Cross-lingual
                                        
                                    
                                
                                

                                
                                Gorau ar gyfer:: 
                                High-fidelity voice cloning
                                
                            
                            
                                
                                    Ceisio MegaTTS3
                                
                            
                        
                    
                    
                
            

            
            
                
                    
                    
                        
                            
                                Kokoro
                                Rhydd
                            
                            
                                Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.
                                
                                    Datblygwr::
Hexgrad
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd: en, ja, zh, ko, fr, de, it, pt, es, hi, ru
                                
                                Gorau ar gyfer:: High-quality TTS with minimal latency, streaming applications
                            
                            
                                Ceisiwch Am Ddim
                            
                        
                    
                    
                    
                        
                            
                                Piper
                                Rhydd
                            
                            
                                Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.
                                
                                    Datblygwr::
Rhasspy
                                    Trwydded::
MIT
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd: en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
                                
                                Gorau ar gyfer:: Quick previews, accessibility, and embedded applications
                            
                            
                                Ceisiwch Am Ddim
                            
                        
                    
                    
                    
                        
                            
                                VITS
                                Rhydd
                            
                            
                                VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.
                                
                                    Datblygwr::
Jaehyeon Kim et al.
                                    Trwydded::
MIT
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd: en, zh, ja, ko
                                
                                Gorau ar gyfer:: General-purpose text-to-speech with natural prosody
                            
                            
                                Ceisiwch Am Ddim
                            
                        
                    
                    
                    
                        
                            
                                MeloTTS
                                Rhydd
                            
                            
                                MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.
                                
                                    Datblygwr::
MyShell.ai
                                    Trwydded::
MIT
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd: en, es, fr, zh, ja, ko
                                
                                Gorau ar gyfer:: Production applications needing fast, multilingual TTS
                            
                            
                                Ceisiwch Am Ddim
                            
                        
                    
                    
                    
                        
                            
                                OuteTTS
                                Rhydd
                            
                            
                                OuteTTS extends large language models with text-to-speech capabilities while preserving the original architecture. It supports multiple backends including llama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, and even browser inference via Transformers.js. Features zero-shot voice cloning through speaker profiles saved as JSON.
                                
                                    Datblygwr::
OuteAI
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd: en
                                
                                Gorau ar gyfer:: Edge deployment, browser-based TTS, low-resource environments
                            
                            
                                Ceisiwch Am Ddim
                            
                        
                    
                    
                    
                        
                            
                                Pocket TTS
                                Rhydd
                            
                            
                                Pocket TTS by Kyutai (creators of Moshi) is a compact 100M parameter text-to-speech model that punches well above its weight. It runs efficiently on CPU, supports zero-shot voice cloning from a single audio sample, and produces natural-sounding speech. The small model size makes it ideal for edge deployment and low-resource environments.
                                
                                    Datblygwr::
Kyutai
                                    Trwydded::
MIT
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd: en, fr
                                
                                Gorau ar gyfer:: Lightweight deployment, CPU-only environments, quick voice cloning
                            
                            
                                Ceisiwch Am Ddim
                            
                        
                    
                    
                    
                        
                            
                                Kitten TTS
                                Rhydd
                            
                            
                                Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.
                                
                                    Datblygwr::
KittenML
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd: en
                                
                                Gorau ar gyfer:: Fast lightweight TTS, edge deployment, low-latency applications
                            
                            
                                Ceisiwch Am Ddim
                            
                        
                    
                    
                
            

            
            
                
                    
                    
                        
                            
                                Bark
                                Arferol
                            
                            
                                Bark by Suno is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio like music, background noise, and sound effects. It can produce nonverbal communications like laughing, sighing, and crying. Bark supports over 100 speaker presets and 13+ languages.
                                
                                    Datblygwr::
Suno
                                    Trwydded::
MIT
                                    Cyflymder:
Slow
                                    Ansawdd::

                                    ieithoedd:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
                                    Clonio Llywio:
 _Na
                                
                                Sound effectsLaughing/sighingMusic generation100+ speakersMultilingual
                                Gorau ar gyfer:: Creative audio content, audiobooks with emotion, sound effects
                            
                            
                                Ceisio Bark
                            
                        
                    
                    
                    
                        
                            
                                Bark Small
                                Arferol
                            
                            
                                Bark Small is a distilled version of the Bark model that trades some audio quality for significantly faster inference speeds and lower memory requirements. It retains Bark's ability to generate speech with emotions, laughter, and multiple languages.
                                
                                    Datblygwr::
Suno
                                    Trwydded::
MIT
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
                                    Clonio Llywio:
 _Na
                                
                                LightweightFaster than full BarkEmotional speechMultilingual
                                Gorau ar gyfer:: Quick creative audio when full Bark is too slow
                            
                            
                                Ceisio Bark Small
                            
                        
                    
                    
                    
                        
                            
                                CosyVoice 2
                                Arferol
                            
                            
                                CosyVoice 2 by Alibaba's Tongyi Lab achieves human-comparable speech quality with extremely low latency, making it ideal for real-time applications. It uses a finite scalar quantization approach for streaming synthesis and supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control. It outperforms many commercial TTS systems in subjective evaluations.
                                
                                    Datblygwr::
Alibaba (Tongyi Lab)
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh, ja, ko, fr, de, it, es
                                    Clonio Llywio:
 _Yw
                                
                                StreamingZero-shot cloningCross-lingualEmotion controlHuman-parity
                                Gorau ar gyfer:: Real-time applications, streaming TTS, voice assistants
                            
                            
                                Ceisio CosyVoice 2
                            
                        
                    
                    
                    
                        
                            
                                Dia TTS
                                Arferol
                            
                            
                                Dia by Nari Labs is a 1.6B parameter text-to-speech model designed specifically for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia is perfect for creating podcast-style content, audiobook dialogues, and interactive conversational AI.
                                
                                    Datblygwr::
Nari Labs
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Na
                                
                                Multi-speakerDialog generationNatural turn-takingEmotional expression1.6B parameters
                                Gorau ar gyfer:: Podcasts, audiobook dialogues, conversational content
                            
                            
                                Ceisio Dia TTS
                            
                        
                    
                    
                    
                        
                            
                                Parler TTS
                                Arferol
                            
                            
                                Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.
                                
                                    Datblygwr::
Hugging Face
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Na
                                
                                Voice descriptionNatural language controlFlexible voice creationNo preset voices needed
                                Gorau ar gyfer:: Creative applications where you need custom voice characteristics
                            
                            
                                Ceisio Parler TTS
                            
                        
                    
                    
                    
                        
                            
                                GLM-TTS
                                Arferol
                            
                            
                                GLM-TTS by Zhipu AI is a text-to-speech system built on the Llama architecture with flow matching. It achieves the lowest character error rate among open-source TTS models, meaning it produces the most accurate pronunciation. GLM-TTS supports English and Chinese with voice cloning from 3-10 second audio samples.
                                
                                    Datblygwr::
Zhipu AI
                                    Trwydded::
GLM-4 License
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh
                                    Clonio Llywio:
 _Yw
                                
                                Lowest error rateVoice cloningFlow matchingNatural prosody
                                Gorau ar gyfer:: Applications requiring maximum pronunciation accuracy
                            
                            
                                Ceisio GLM-TTS
                            
                        
                    
                    
                    
                        
                            
                                IndexTTS-2
                                Arferol
                            
                            
                                IndexTTS-2 is an advanced text-to-speech system that excels at zero-shot voice synthesis with fine-grained emotion control. It can generate speech with specific emotional tones like happy, sad, angry, or fearful without requiring emotion-specific training data. The model uses emotion vectors to precisely control the emotional expression of generated speech.
                                
                                    Datblygwr::
Index Team
                                    Trwydded::
Bilibili Model License
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh
                                    Clonio Llywio:
 _Yw
                                
                                Emotion controlZero-shotEmotion vectorsExpressive speechFine-grained control
                                Gorau ar gyfer:: Emotionally expressive content, audiobooks, virtual assistants
                            
                            
                                Ceisio IndexTTS-2
                            
                        
                    
                    
                    
                        
                            
                                Spark TTS
                                Arferol
                            
                            
                                Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.
                                
                                    Datblygwr::
SparkAudio
                                    Trwydded::
CC BY-NC-SA 4.0
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh
                                    Clonio Llywio:
 _Yw
                                
                                Voice cloningEmotion controlStyle controlPrompt-based5-second cloning
                                Gorau ar gyfer:: Content creation with cloned voices and emotional control
                            
                            
                                Ceisio Spark TTS
                            
                        
                    
                    
                    
                        
                            
                                GPT-SoVITS
                                Arferol
                            
                            
                                GPT-SoVITS combines GPT-style language modeling with SoVITS (Singing Voice Inference via Translation and Synthesis) for powerful few-shot voice cloning. With as little as 5 seconds of reference audio, it can accurately clone a voice and generate new speech while preserving the speaker's unique characteristics. It excels at both speaking and singing voice synthesis.
                                
                                    Datblygwr::
RVC-Boss
                                    Trwydded::
MIT
                                    Cyflymder:
Slow
                                    Ansawdd::

                                    ieithoedd:
en, zh, ja, ko
                                    Clonio Llywio:
 _Yw
                                
                                5-second cloningSinging voiceFew-shot learningHigh fidelityCross-lingual
                                Gorau ar gyfer:: Voice cloning, singing synthesis, content creator voice replication
                            
                            
                                Ceisio GPT-SoVITS
                            
                        
                    
                    
                    
                        
                            
                                Orpheus
                                Arferol
                            
                            
                                Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.
                                
                                    Datblygwr::
Canopy Labs
                                    Trwydded::
Llama 3.2 Community
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Na
                                
                                Human-level emotion100K hours trainingNatural emphasisExpressive speech
                                Gorau ar gyfer:: High-quality emotional speech, audiobooks, voice acting
                            
                            
                                Ceisio Orpheus
                            
                        
                    
                    
                    
                        
                            
                                Qwen3 TTS
                                Arferol
                            
                            
                                Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports three modes: preset voices with emotion control (9 speakers), voice cloning from just 3 seconds of audio, and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.
                                
                                    Datblygwr::
Alibaba (Qwen)
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh, ja, ko, de, fr, ru, pt, es, it
                                    Clonio Llywio:
 _Yw
                                
                                Voice cloning9 preset voicesVoice design from textEmotion control10 languages
                                Gorau ar gyfer:: Multilingual content with voice cloning or custom voice design
                            
                            
                                Ceisio Qwen3 TTS
                            
                        
                    
                    
                    
                        
                            
                                Chatterbox Turbo
                                Arferol
                            
                            
                                Chatterbox Turbo by Resemble AI is a 350M parameter upgrade to Chatterbox, delivering up to 6x real-time speed with sub-200ms latency. It supports paralinguistic tags like [laugh], [cough], and [chuckle] directly in text. Includes Perth watermarking on all generated audio for provenance tracking.
                                
                                    Datblygwr::
Resemble AI
                                    Trwydded::
MIT
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Yw
                                
                                Sub-200ms latencyParalinguistic tags6x real-timeVoice cloningWatermarking
                                Gorau ar gyfer:: Real-time voice agents, expressive speech with natural sounds
                            
                            
                                Ceisio Chatterbox Turbo
                            
                        
                    
                    
                    
                        
                            
                                Zonos
                                Arferol
                            
                            
                                Zonos v0.1 by Zyphra is a 1.6B parameter model featuring fine-grained emotion control with sliders for happiness, anger, sadness, fear, and surprise. It offers both a Transformer and a novel SSM (state-space model) variant. Trained on 200K+ hours of multilingual speech with zero-shot voice cloning from 10-30 seconds of reference audio.
                                
                                    Datblygwr::
Zyphra
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, ja, zh, fr, de
                                    Clonio Llywio:
 _Yw
                                
                                Emotion controlVoice cloningSSM architectureMultilingualPitch/rate control
                                Gorau ar gyfer:: Expressive speech with emotion control, voice design studio
                            
                            
                                Ceisio Zonos
                            
                        
                    
                    
                    
                        
                            
                                Dia 2
                                Arferol
                            
                            
                                Dia2 by Nari Labs is a streaming-first upgrade to Dia, available in 1B and 2B parameter variants. It begins synthesizing audio from the first few tokens, making it ideal for real-time voice agents and speech-to-speech pipelines. Supports multi-speaker dialogue with [S1]/[S2] tags and paralinguistic cues like (laughs), (coughs).
                                
                                    Datblygwr::
Nari Labs
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Na
                                
                                Streaming outputMulti-speakerLow latencyParalinguistic cuesUp to 2 min output
                                Gorau ar gyfer:: Real-time voice agents, dialogue generation, streaming applications
                            
                            
                                Ceisio Dia 2
                            
                        
                    
                    
                    
                        
                            
                                VoxCPM
                                Arferol
                            
                            
                                VoxCPM 1.5 by OpenBMB is a novel tokenizer-free TTS model that operates in continuous space rather than discrete tokens. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from 3-10 seconds, and maintains consistency across paragraphs. Cross-language cloning lets you apply an English voice to Chinese speech and vice versa.
                                
                                    Datblygwr::
OpenBMB
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd:
en, zh
                                    Clonio Llywio:
 _Yw
                                
                                44.1kHz audioTokenizer-freeCross-lingual cloningContext-awareLoRA fine-tuning
                                Gorau ar gyfer:: High-fidelity audio, audiobooks, long-form content with voice consistency
                            
                            
                                Ceisio VoxCPM
                            
                        
                    
                    
                    
                        
                            
                                TADA
                                Arferol
                            
                            
                                TADA (Text-Acoustic Dual Alignment) by Hume AI is a groundbreaking TTS model that eliminates hallucinations through a novel dual alignment architecture built on Llama 3.2. Available in 1B (English) and 3B (multilingual) variants, TADA achieves an RTF of 0.09 — 5x faster than comparable LLM-based TTS models. It supports up to 700 seconds of audio context and produces emotionally expressive speech with zero hallucinations on standard benchmarks.
                                
                                    Datblygwr::
Hume AI
                                    Trwydded::
MIT
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Na
                                
                                Zero hallucinations5x faster than LLM TTSEmotional expression700s audio contextDual alignment
                                Gorau ar gyfer:: High-quality hallucination-free speech, emotional expression, fast inference
                            
                            
                                Ceisio TADA
                            
                        
                    
                    
                    
                        
                            
                                VibeVoice
                                Arferol
                            
                            
                                VibeVoice from Microsoft generates long-form speech up to 90 minutes with support for 4 simultaneous speakers, making it ideal for podcasts and dialogues. The Realtime 0.5B variant achieves ~300ms latency for interactive use. Supports speaker tags for multi-turn dialogue generation.
                                
                                    Datblygwr::
Microsoft
                                    Trwydded::
MIT
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd:
en, zh
                                    Clonio Llywio:
 _Na
                                
                                Multi-speakerLong-form (90 min)Podcast generationDialogueLow latency
                                Gorau ar gyfer:: Podcasts, dialogues, long-form narration, multi-speaker content
                            
                            
                                Ceisio VibeVoice
                            
                        
                    
                    
                    
                        
                            
                                CosyVoice3
                                Arferol
                            
                            
                                CosyVoice3 is the latest evolution from Alibaba's FunAudioLLM team. It features bi-streaming inference with ~150ms latency, instruction-based control for emotion/speed/volume, and improved speaker similarity for zero-shot cloning. Supports 9 languages plus 18 Chinese dialects. RL-tuned variant delivers state-of-the-art prosody.
                                
                                    Datblygwr::
Alibaba (FunAudioLLM)
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Fast
                                    Ansawdd::

                                    ieithoedd:
en, zh, ja, ko, de, es, fr, it, ru
                                    Clonio Llywio:
 _Yw
                                
                                Bi-streamingEmotion controlVoice cloningSpeed/volume controlInstruction following
                                Gorau ar gyfer:: Multilingual production TTS, real-time applications, voice cloning
                            
                            
                                Ceisio CosyVoice3
                            
                        
                    
                    
                
            

            
            
                
                    
                    
                        
                            
                                Chatterbox
                                Cyntaf
                            
                            
                                Chatterbox by Resemble AI is a cutting-edge zero-shot voice cloning model. It can replicate any voice from a single audio sample with remarkable accuracy, capturing not just the timbre but also the speaking style and emotional nuances. Chatterbox also features fine-grained emotion control, allowing you to adjust the emotional tone of the generated speech independently from the voice identity.
                                
                                    Datblygwr::
Resemble AI
                                    Trwydded::
MIT
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Yw
                                    VRAM:
4GB
                                    Cost y nod 1K:
4x
                                
                                Zero-shot cloningEmotion controlHigh fidelityStyle transferSingle sample cloning
                                Gorau ar gyfer:: Professional voice cloning with emotional control, content creation
                            
                            
                                Ceisio Chatterbox
                            
                        
                    
                    
                    
                        
                            
                                Tortoise TTS
                                Cyntaf
                            
                            
                                Tortoise TTS is an autoregressive multi-voice text-to-speech system that prioritizes audio quality over speed. It uses DALL-E-inspired architecture to generate highly natural speech with excellent prosody and speaker similarity. While slower than many alternatives, Tortoise produces some of the most realistic synthetic speech available in the open-source ecosystem.
                                
                                    Datblygwr::
James Betker
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Slow
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Yw
                                    VRAM:
8GB
                                    Cost y nod 1K:
4x
                                
                                Highest qualityMulti-voiceDALL-E architectureVoice cloningAutoregressive
                                Gorau ar gyfer:: Audiobooks, premium content, quality-first applications
                            
                            
                                Ceisio Tortoise TTS
                            
                        
                    
                    
                    
                        
                            
                                StyleTTS 2
                                Cyntaf
                            
                            
                                StyleTTS 2 achieves human-level TTS synthesis by combining style diffusion with adversarial training using large speech language models. It generates the most natural sounding speech among single-speaker models, rivaling human recordings. StyleTTS 2 uses diffusion-based style modeling to capture the full range of human speech variation.
                                
                                    Datblygwr::
Columbia University
                                    Trwydded::
MIT
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Na
                                    VRAM:
4GB
                                    Cost y nod 1K:
4x
                                
                                Human-levelStyle diffusionAdversarial trainingNatural variationHigh fidelity
                                Gorau ar gyfer:: Studio-quality single-speaker synthesis, professional narration
                            
                            
                                Ceisio StyleTTS 2
                            
                        
                    
                    
                    
                        
                            
                                OpenVoice
                                Cyntaf
                            
                            
                                OpenVoice by MyShell.ai enables instant voice cloning with granular control over voice style, emotion, accent, rhythm, pauses, and intonation. It can clone a voice from a short audio clip and generate speech in multiple languages while maintaining the speaker identity. OpenVoice also functions as a voice converter, allowing real-time voice transformation.
                                
                                    Datblygwr::
MyShell.ai / MIT
                                    Trwydded::
MIT
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh, ja, ko, fr, de, es, it
                                    Clonio Llywio:
 _Yw
                                    VRAM:
4GB
                                    Cost y nod 1K:
4x
                                
                                Instant cloningVoice conversionEmotion controlAccent controlMultilingual
                                Gorau ar gyfer:: Voice cloning with fine-grained style control, voice conversion
                            
                            
                                Ceisio OpenVoice
                            
                        
                    
                    
                    
                        
                            
                                Sesame CSM
                                Cyntaf
                            
                            
                                Sesame CSM (Conversational Speech Model) is a 1 billion parameter model designed specifically for generating conversational speech. It models the natural patterns of human conversation including turn-taking timing, backchannel responses, emotional reactions, and conversational flow. CSM generates audio that sounds like a natural human conversation rather than synthetic speech.
                                
                                    Datblygwr::
Sesame
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Slow
                                    Ansawdd::

                                    ieithoedd:
en
                                    Clonio Llywio:
 _Na
                                    VRAM:
8GB
                                    Cost y nod 1K:
4x
                                
                                ConversationalNatural timingTurn-takingBackchannel1B parameters
                                Gorau ar gyfer:: AI assistants, chatbots, conversational AI applications
                            
                            
                                Ceisio Sesame CSM
                            
                        
                    
                    
                    
                        
                            
                                MOSS-TTS
                                Cyntaf
                            
                            
                                MOSS-TTS from OpenMOSS supports generation of up to 1 hour of continuous speech across 20 languages. Features token-level duration control, phoneme-level pronunciation control via IPA/Pinyin, and code-switching between languages. The 8B production model delivers state-of-the-art quality with zero-shot voice cloning from reference audio.
                                
                                    Datblygwr::
OpenMOSS
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Medium
                                    Ansawdd::

                                    ieithoedd:
en, zh, de, es, fr, ja, it, hu, ko, ru, fa, ar, pl, pt, cs, da, sv, el, tr
                                    Clonio Llywio:
 _Yw
                                    VRAM:
16GB
                                    Cost y nod 1K:
4x
                                
                                Ultra-long generation20 languagesVoice cloningDuration controlPronunciation controlCode-switching
                                Gorau ar gyfer:: Audiobooks, long-form content, multilingual production
                            
                            
                                Ceisio MOSS-TTS
                            
                        
                    
                    
                    
                        
                            
                                MegaTTS3
                                Cyntaf
                            
                            
                                MegaTTS3 from ByteDance uses a novel sparse alignment mechanism combined with a latent diffusion transformer. Features adjustable trade-off between speech intelligibility and speaker similarity for zero-shot voice cloning.
                                
                                    Datblygwr::
ByteDance
                                    Trwydded::
Apache 2.0
                                    Cyflymder:
Slow
                                    Ansawdd::

                                    ieithoedd:
en, zh
                                    Clonio Llywio:
 _Yw
                                    VRAM:
8GB
                                    Cost y nod 1K:
4x
                                
                                Voice cloningAdjustable similarityCross-lingual
                                Gorau ar gyfer:: High-fidelity voice cloning
                            
                            
                                Ceisio MegaTTS3
                            
                        
                    
                    
                
            
        

        
        
            Tabl Cymharu Modelau
            
                
                    
                        
                            Model
                            Datblygwr:
                            o Fawrth
                            Ansawdd:
                            Cyflymder
                            ieithoedd
                            Clonio Llywio
                            VRAM
                            Trwydded:
                            credydau
                            
                        
                    
                    
                        
                        
                            Kokoro
                            Hexgrad
                            Free
                            
                            Fast
                            11
                            
                            1.5GB
                            Apache 2.0
                            Rhydd
                            Defnyddio
                        
                        
                        
                            Piper
                            Rhasspy
                            Free
                            
                            Fast
                            31
                            
                            0 (CPU only)
                            MIT
                            Rhydd
                            Defnyddio
                        
                        
                        
                            VITS
                            Jaehyeon Kim et al.
                            Free
                            
                            Fast
                            4
                            
                            1GB
                            MIT
                            Rhydd
                            Defnyddio
                        
                        
                        
                            MeloTTS
                            MyShell.ai
                            Free
                            
                            Fast
                            6
                            
                            0.5GB (GPU optional)
                            MIT
                            Rhydd
                            Defnyddio
                        
                        
                        
                            Bark
                            Suno
                            Standard
                            
                            Slow
                            13
                            
                            5GB
                            MIT
                            2
                            Defnyddio
                        
                        
                        
                            Bark Small
                            Suno
                            Standard
                            
                            Medium
                            13
                            
                            2GB
                            MIT
                            2
                            Defnyddio
                        
                        
                        
                            CosyVoice 2
                            Alibaba (Tongyi Lab)
                            Standard
                            
                            Medium
                            8
                            
                            4GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            Dia TTS
                            Nari Labs
                            Standard
                            
                            Medium
                            1
                            
                            4GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            Parler TTS
                            Hugging Face
                            Standard
                            
                            Medium
                            1
                            
                            4GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            GLM-TTS
                            Zhipu AI
                            Standard
                            
                            Medium
                            2
                            
                            4GB
                            GLM-4 License
                            2
                            Defnyddio
                        
                        
                        
                            IndexTTS-2
                            Index Team
                            Standard
                            
                            Medium
                            2
                            
                            4GB
                            Bilibili Model License
                            2
                            Defnyddio
                        
                        
                        
                            Spark TTS
                            SparkAudio
                            Standard
                            
                            Medium
                            2
                            
                            4GB
                            CC BY-NC-SA 4.0
                            2
                            Defnyddio
                        
                        
                        
                            GPT-SoVITS
                            RVC-Boss
                            Standard
                            
                            Slow
                            4
                            
                            6GB
                            MIT
                            2
                            Defnyddio
                        
                        
                        
                            Orpheus
                            Canopy Labs
                            Standard
                            
                            Medium
                            1
                            
                            4GB
                            Llama 3.2 Community
                            2
                            Defnyddio
                        
                        
                        
                            Chatterbox
                            Resemble AI
                            Premium
                            
                            Medium
                            1
                            
                            4GB
                            MIT
                            4
                            Defnyddio
                        
                        
                        
                            Tortoise TTS
                            James Betker
                            Premium
                            
                            Slow
                            1
                            
                            8GB
                            Apache 2.0
                            4
                            Defnyddio
                        
                        
                        
                            StyleTTS 2
                            Columbia University
                            Premium
                            
                            Medium
                            1
                            
                            4GB
                            MIT
                            4
                            Defnyddio
                        
                        
                        
                            OpenVoice
                            MyShell.ai / MIT
                            Premium
                            
                            Medium
                            8
                            
                            4GB
                            MIT
                            4
                            Defnyddio
                        
                        
                        
                            Qwen3 TTS
                            Alibaba (Qwen)
                            Standard
                            
                            Medium
                            10
                            
                            7GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            Sesame CSM
                            Sesame
                            Premium
                            
                            Slow
                            1
                            
                            8GB
                            Apache 2.0
                            4
                            Defnyddio
                        
                        
                        
                            Chatterbox Turbo
                            Resemble AI
                            Standard
                            
                            Fast
                            1
                            
                            2GB
                            MIT
                            2
                            Defnyddio
                        
                        
                        
                            Zonos
                            Zyphra
                            Standard
                            
                            Medium
                            5
                            
                            6GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            Dia 2
                            Nari Labs
                            Standard
                            
                            Fast
                            1
                            
                            4GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            VoxCPM
                            OpenBMB
                            Standard
                            
                            Fast
                            2
                            
                            4GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            OuteTTS
                            OuteAI
                            Free
                            
                            Fast
                            1
                            
                            2GB
                            Apache 2.0
                            Rhydd
                            Defnyddio
                        
                        
                        
                            TADA
                            Hume AI
                            Standard
                            
                            Fast
                            1
                            
                            5GB
                            MIT
                            2
                            Defnyddio
                        
                        
                        
                            VibeVoice
                            Microsoft
                            Standard
                            
                            Fast
                            2
                            
                            4GB
                            MIT
                            2
                            Defnyddio
                        
                        
                        
                            Pocket TTS
                            Kyutai
                            Free
                            
                            Fast
                            2
                            
                            1GB
                            MIT
                            Rhydd
                            Defnyddio
                        
                        
                        
                            Kitten TTS
                            KittenML
                            Free
                            
                            Fast
                            1
                            
                            0GB
                            Apache 2.0
                            Rhydd
                            Defnyddio
                        
                        
                        
                            CosyVoice3
                            Alibaba (FunAudioLLM)
                            Standard
                            
                            Fast
                            9
                            
                            4GB
                            Apache 2.0
                            2
                            Defnyddio
                        
                        
                        
                            MOSS-TTS
                            OpenMOSS
                            Premium
                            
                            Medium
                            19
                            
                            16GB
                            Apache 2.0
                            4
                            Defnyddio
                        
                        
                        
                            MegaTTS3
                            ByteDance
                            Premium
                            
                            Slow
                            2
                            
                            8GB
                            Apache 2.0
                            4
                            Defnyddio
                        
                        
                    
                
            
        
    




    
        
            
                Y platfform Testun-i-Leferydd AI mwyaf cymhwysolName

                
                    
                        Pam Dewis TTS.ai ar gyfer Testun i Leferydd?
                        TTS.ai yn dod â'r byd ynghyd
                        Mae pob model yn ffynhonnell agored o dan MIT, Apache 2.0, neu drwyddedau caniatáu tebyg, gan sicrhau bod gennych chi hawliau masnachol llawn i ddefnyddio'r sain a gynhyrchir yn eich prosiectau. P'un a oes angen cyfansoddiad cyflym ac ysgafn arnoch ar gyfer cymwysiadau gwir-amser neu allbwn ansawdd stiwdio rhagorol ar gyfer llyfrau sain a podcasts, mae gan TTS.ai y model cywir ar gyfer pob achos defnydd.

                        Modelau Am Ddim, Dim Angen Cyfrif
                        Dechreuwch yn syth gyda thri model TTS am ddim: Piper (yn gyflym iawn, yn ysgafn), VITS (cymysgu nerfol o ansawdd uchel), a MeloTTS (cynhaliaeth aml-iaith). Dim cofrestru, dim cerdyn credyd, dim cyfyngiadau ar genhedloedd. Mae modelau am ddim yn cynnal Saesneg a nifer o ieithoedd eraill gydag allbwn sain naturiol sy'n addas i'r rhan fwyaf o gymwysiadau.
                    
                    
                        Prosesu Gyflymedig-GPU
                        Mae pob model TTS yn rhedeg ar GPU NVIDIA arbenigol ar gyfer amserau cynhyrchu cyflym a chyson. Mae modelau am ddim yn cynhyrchu sain mewn llai na 2 eiliad fel arfer. Mae modelau safonol fel Kokoro, CosyVoice 2, a Bark yn cymryd 3-5 eiliad ar gyfartaledd. Mae modelau premiwm gyda'r ansawdd uchaf, fel Tortoise a Chatterbox, yn prosesu mewn 5-15 eiliad yn dibynnu ar hyd y testun.

                        Cynhelir 30+ iaith
                        Creu siarad mewn mwy na 30 o ieithoedd gan gynnwys Saesneg, Sbaeneg, Ffrangeg, Almaeneg, Eidaleg, Portiwgaleg, Tsieinëeg, Japaneg, Corea, Arabeg, Hindi, Rwsieg, a llawer mwy. Mae rhai modelau yn cynnal cyfansoddiad rhwng ieithoedd, sy'n golygu y gallwch greu siarad mewn iaith nad oedd y llais gwreiddiol erioed wedi ei hyfforddi arni. CosyVoice 2 a GPT-SoVITS sy'n rhagorol mewn clonio llais rhwng ieithoedd.

                        API ar gyfer datblygwyr
                        Cyfuno TTS.ai â'ch rhaglenni gyda'n API REST sy'n gydnaws â OpenAI. Un diweddbwyntiau ar gyfer pob 20+ model. Python, JavaScript, cURL, a Go SDKs. Cynhaliaeth llif ar gyfer rhaglenni amser real. Prosesu batch ar gyfer creu cynnwys ar raddfa fawr. Webhooks ar gyfer hysbysiadau async. Ar gael ar gynlluniau Pro a Enterprise.
                    
                
            
        
    









    



    
        
        
        Dysgu mwy →
        
    










    
        Cwestiynau a Ofynnir yn Aml
        
            
                
                    
                    
                        
                            
                        
                        
                            
                                Technoleg AI yw Testun-i-Lafaru (TTS) sy'n trosi testun ysgrifenedig i sain lafar sy'n swnio'n naturiol. Defnyddia modelau TTS nerfol modern fel Kokoro, Chatterbox, a CosyVoice 2 ddysgu ddwfn i gynhyrchu iaith sy'n swnio'n naturiol, gyda phrosodi, teimlad, a rhythm.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Mae'n dibynnu ar eich anghenion. Ar gyfer rhagolygon cyflym, defnyddiwch Piper neu MeloTTS (am ddim, cyflym). Ar gyfer ansawdd uchel, ceisiwch Kokoro neu CosyVoice 2 (haen safonol). Ar gyfer clonio llais, defnyddiwch Chatterbox neu GPT-SoVITS (premiwm). Ar gyfer cynnwys ymgom/podcast, ceisiwch Dia TTS. Mae gan bob model gryfderau gwahanol - arbrofi i ddod o hyd i'r un gorau.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Ie! Mae TTS.ai yn cynnig testun-i-ganu am ddim gyda modelau Kokoro, Piper, VITS, a MeloTTS. Nid oes angen cyfrif am hyd at 500 o nodau a 3 cenedlaethau bob awr. Cofrestru ar gyfer cyfrif am ddim i gael 50 credyd a mynediad i bob model.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Mae ein modelau TTS yn cynnal 30+ o ieithoedd, gan gynnwys Saesneg, Sbaeneg, Ffrangeg, Almaeneg, Eidaleg, Portiwgaleg, Tsieinëeg, Japaneg, Corea, Arabeg, Rwsieg, Hindi, a llawer mwy. Mae ar gael iaith yn amrywio yn ôl y model.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Ie, gellir defnyddio sain a gynhyrchir drwy TTS.ai yn fasnachol. Defnyddia pob un o'n modelau drwyddedau ffynhonnell agored (MIT, Apache 2.0). Gwiriwch drwyddedau modelau unigol am delerau penodol. Rydym yn argymell adolygu trwydded y model penodol rydych yn ei ddefnyddio ar gyfer eich prosiect.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Cynhelir y fformatau allbwn MP3, WAV, OGG, a FLAC gan TTS.ai. MP3 yw'r fformat rhagosodedig ar gyfer chwarae gwe. Argymhellir WAV ar gyfer prosesu sain pellach. Gallwch drosi rhwng fformatau gan ddefnyddio ein cyfieithydd sain.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Defnyddia clonio llais AI i ail-greu llais penodol o sampl sain byr (5-30 eiliad yn aml). Lawrlwythwch recordiad clir o'r llais targed, a bydd modelau fel Chatterbox, GPT-SoVITS, neu OpenVoice yn creu llais newydd yn y llais hwn. Mae'r ansawdd yn gwella gyda sain cyfeirio mwy glan.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Gall defnyddwyr rhad ac am ddim greu hyd at 500 o nodau ar gyfer pob cais. Mae gan ddefnyddwyr sydd wedi cofrestru hyd at 5,000 o nodau ar gyfer pob cais. Ar gyfer testunau hirach, mae'r sain yn cael ei greu mewn darnau ac yn cael ei gyfuno'n awtomatig. Gall defnyddwyr API brosesu hyd at 10,000 o nodau ar gyfer pob cais.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Mae cynhaliaeth SSML (Speech Synthesis Markup Language) yn amrywio yn ôl y model. Cynhelir tagiau SSML sylfaenol gan Piper a rhai modelau eraill ar gyfer seibio, pwyslais, a rheoli ynganiad. Ar gyfer modelau heb gynhaliaeth SSML naturiol, gallwch ddefnyddio atalnodi naturiol a thorri llinellau i effeithio ar y prosod.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Ie, mae'r rhan fwyaf o fathau'n cynnal addasiad cyflymder o 0.5x i 2.0x. Mae rhai mathau fel Bark a Parler hefyd yn caniatáu rheoli'r uchder a'r arddull. Gallwch osod paramedrau cyflymder yn y panel gosodiadau uwch neu drwy'r paramedr cyflymder API.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Ydy, mae prosesu batch ar gael drwy ein API. Gallwch gyflwyno nifer o rannau testun mewn galwad API neu sgript sengl, a bydd pob un yn cael ei brosesu a'i ddychwelyd fel ffeiliau sain ar wahân. Mae hyn yn berffaith ar gyfer papurau llyfr sain, modiwlau e-ddysgu, neu sgriptiau ymgom gêm.
                            
                        
                    
                    
                    
                        
                            
                        
                        
                            
                                Creu allwedd API o'ch panel rheoli cyfrif, ac yna anfon ceisiadau POST i'n pwynt diwedd REST API gyda'ch testun, model, a paramedrau llais. Rydym yn darparu enghreifftiau o godau mewn Python, JavaScript, a cURL. Mae'r API yn gydnaws â OpenAI, felly mae integreiddiad cyfredol yn gweithio gyda newidiadau lleiaf posibl.
                            
                        
                    
                    
                
            
        
    








    
        
            
                
                
                
                
                
                
                
                
                
                
                
                
                
            
            5.0/5 (2)
        
        
            Beth allwn ni ei wella? Mae eich adborth yn ein helpu i ddatrys problemau.
            
                
                
                
                
            
            
                
                
            
        
    







    
        Dechrau Trosi Testun i Leferydd Nawr
        Ymuno â miloedd o gynhyrchwyr sy'n defnyddio TTS.ai. Cael 15,000 o nodau am ddim gyda chyfrif newydd. Modelau am ddim ar gael heb gofrestru.
        
            
            Cofrestru
            Gweld Prisiau

Datblygwr:	ByteDance
Trwydded:	Apache 2.0
Cyflymder	Slow
Ansawdd:
ieithoedd	2 ieithoedd
VRAM	8GB
Clonio Llywio	Cynhelir

Model	Datblygwr:	o Fawrth	Cyflymder	ieithoedd	VRAM	Trwydded:	credydau
Kokoro	Hexgrad	Free	Fast	11	1.5GB	Apache 2.0	Rhydd	Defnyddio
Piper	Rhasspy	Free	Fast	31	0 (CPU only)	MIT	Rhydd	Defnyddio
VITS	Jaehyeon Kim et al.	Free	Fast	4	1GB	MIT	Rhydd	Defnyddio
MeloTTS	MyShell.ai	Free	Fast	6	0.5GB (GPU optional)	MIT	Rhydd	Defnyddio
Bark	Suno	Standard	Slow	13	5GB	MIT	2	Defnyddio
Bark Small	Suno	Standard	Medium	13	2GB	MIT	2	Defnyddio
CosyVoice 2	Alibaba (Tongyi Lab)	Standard	Medium	8	4GB	Apache 2.0	2	Defnyddio
Dia TTS	Nari Labs	Standard	Medium	1	4GB	Apache 2.0	2	Defnyddio
Parler TTS	Hugging Face	Standard	Medium	1	4GB	Apache 2.0	2	Defnyddio
GLM-TTS	Zhipu AI	Standard	Medium	2	4GB	GLM-4 License	2	Defnyddio
IndexTTS-2	Index Team	Standard	Medium	2	4GB	Bilibili Model License	2	Defnyddio
Spark TTS	SparkAudio	Standard	Medium	2	4GB	CC BY-NC-SA 4.0	2	Defnyddio
GPT-SoVITS	RVC-Boss	Standard	Slow	4	6GB	MIT	2	Defnyddio
Orpheus	Canopy Labs	Standard	Medium	1	4GB	Llama 3.2 Community	2	Defnyddio
Chatterbox	Resemble AI	Premium	Medium	1	4GB	MIT	4	Defnyddio
Tortoise TTS	James Betker	Premium	Slow	1	8GB	Apache 2.0	4	Defnyddio
StyleTTS 2	Columbia University	Premium	Medium	1	4GB	MIT	4	Defnyddio
OpenVoice	MyShell.ai / MIT	Premium	Medium	8	4GB	MIT	4	Defnyddio
Qwen3 TTS	Alibaba (Qwen)	Standard	Medium	10	7GB	Apache 2.0	2	Defnyddio
Sesame CSM	Sesame	Premium	Slow	1	8GB	Apache 2.0	4	Defnyddio
Chatterbox Turbo	Resemble AI	Standard	Fast	1	2GB	MIT	2	Defnyddio
Zonos	Zyphra	Standard	Medium	5	6GB	Apache 2.0	2	Defnyddio
Dia 2	Nari Labs	Standard	Fast	1	4GB	Apache 2.0	2	Defnyddio
VoxCPM	OpenBMB	Standard	Fast	2	4GB	Apache 2.0	2	Defnyddio
OuteTTS	OuteAI	Free	Fast	1	2GB	Apache 2.0	Rhydd	Defnyddio
TADA	Hume AI	Standard	Fast	1	5GB	MIT	2	Defnyddio
VibeVoice	Microsoft	Standard	Fast	2	4GB	MIT	2	Defnyddio
Pocket TTS	Kyutai	Free	Fast	2	1GB	MIT	Rhydd	Defnyddio
Kitten TTS	KittenML	Free	Fast	1	0GB	Apache 2.0	Rhydd	Defnyddio
CosyVoice3	Alibaba (FunAudioLLM)	Standard	Fast	9	4GB	Apache 2.0	2	Defnyddio
MOSS-TTS	OpenMOSS	Premium	Medium	19	16GB	Apache 2.0	4	Defnyddio
MegaTTS3	ByteDance	Premium	Slow	2	8GB	Apache 2.0	4	Defnyddio