Laporan Kesalahan / Panggonan Fitur

VITS TTS

The end-to-end TTS architecture that combines a variational autoencoder, normalizing flows, and adversarial training.

Teks
Fayl

0/500 aksara · 5000 kanggo saben generasi →

Ndaftar for 5,000 characters limit

Modus SSML (Speech Synthesis Markup Language for fine controlName)

Nglapisi teks ing tag SSML kanggo kontrol sing tepat:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emotion / Style tags

Tag kang dipahami model kang dipilih — klik kanggo ngethok siji menyang teks sampeyan ing ngendi iku kedadeyan:

Kamus Pengucapan

Nyathet pangucapan standar (kata = pangucapan):

Pitch 0

-12 +12

Формат диалога: Gunake tag [S1] lan [S2] kanggo nyambungake pamicara sing beda. Conto:

[S1] Halo! [S2] Halo, apa kabarmu?



                

                
                
                    
                    
                        Model AI
                        
                    

                    
                    
                        
                            Suara
                            
                        
                        
                            
                            
                                
                                
                                
                            
                            
                        
                    
                
                

                
                
                    
                    
                        Basa
                        
                    

                    
                    
                        Format Keluaran
                        
                    

                    
                    
                        
                            Kecepatan
                            1.0x
                        
                        
                        
                            0.5x
                            2.0x
                        
                    
                

                
                
                    
                    
                        
                        Bebas karo Piper, VITS, MeloTTS



        
        
            
                Audio anu dihasilkeun bakal muncul di dieu. Pilih model, ketok teks, sarta ketok Janji.
            
            
            
                
                
                    Penciptaan gagal
                    
                
            
        

            
                
                    
                        
                            Audio berhasil diciptakan
                            
                        
                        






    
        
            
                
                
                
                0:00
                
                    
                    
                        
                    
                
                
                    
                
                
            
        
    



                        
                            
                                Muat turun audio
                            
                            
                                Muat turun.srt
                            
                            
                            
                            Link expires in 24h
                            
                                
                                
                                
                                
                                
                            
                        
                        
                        
                            Kacamatan iki kalebu: Kacamatan Semarang. Lisénsi komersial saka $ 5 / mo
                        
                        
                    
                
            
        

        
        
            
                
                    Aksara bebas kurang
                    Njupuk 200K karakter saben wulan - $ 5/mo
                    utawa siji-wektu 100K paket kanggo $5
                
            
            
                
                    Buat iki suaramu dhewe
                    Klon suara dalam 30 detik
                    
                
            
        

        

    
        
            
                
                    Love TTS.ai? Nyathet kanca-kancamu!





    
        
            
                ✨ Model Suara Premium
                
            
            
                Ieu model sora premium, aya dina sagala rencana bayar. Anjeun masih bisa ningalikeun sorana sacara bébas ku tombol mainkeun di samping pamilihan sora.
                
                    Unlock premium suara — $5/mo
                    Ngbandingkeun rencana
                
            
        
    





    
        
            
                
                
                    Tuku karakter tambahan
                    
    Ora ana iklan
    Nggunakake tanpa wates
    Pitulung Prioritas
    Akses awal kanggo fitur anyar


                
                

                
                    
                        Tambah Karakter






    
    
        
            About VITS
            VITS — Variational Inference with adversarial learning for end-to-end Text-to-Speech — was introduced by Jaehyeon Kim and collaborators in 2021 and became a foundational architecture for modern neural speech. Rather than the older two-stage pipeline, it synthesizes audio in a single parallel end-to-end pass, pairing a variational autoencoder with normalizing flows and a GAN-style adversarial training process to lift naturalness. At about 25M parameters and trained on ~585 hours, it produces natural prosody at fast inference speeds and supports multiple speakers. It serves as a solid general-purpose, free baseline and underpins many later models such as Piper and MeloTTS.
            
            Paling apik kanggo: General-purpose text-to-speech with natural prosody
            
            Nglayar kabeh VITS suara
        
        
            
                
                    Ing cetha
                    
                        Pangembang
Jaehyeon Kim et al.
                        Lisensi
MIT
                        Tingkat
free
                        Kecepatan
fast
                        Kloning suara
Ora
                        Basa
English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, Polish
                        Karakter paling akeh
2000
                    
                
            
        
    

    
    
    VITS suara
    
        
        
            
                
                    
                        
                            CSS10 (Dutch)
                            Dutch
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (Finnish)
                            Finnish
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (French)
                            French
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (German)
                            German
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (Hungarian)
                            Hungarian
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (Spanish)
                            Spanish
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            Common Voice (Bulgarian)
                            Bulgarian
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            Common Voice (Portuguese)
                            Portuguese
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            Default
                            English
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            MAI (Polish)
                            Polish
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Female
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            MAI (Ukrainian)
                            Ukrainian
                        
                        
                        
                        
                    
                    
                        Bebas
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
    
    

    
    
    VITS TTS — FAQ
    
        
        
            
                
            
            
                VITS means Variational Inference with adversarial learning for end-to-end Text-to-Speech. It generates audio in a single parallel pass using a variational autoencoder, normalizing flows, and adversarial (GAN) training, rather than a two-stage pipeline.
            
        
        
        
            
                
            
            
                Yes. VITS is MIT-licensed and in the free tier, so it can be used commercially.
            
        
        
        
            
                
            
            
                On TTS.ai, VITS covers 11 languages including English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, and Polish, with multi-speaker support. It does not do voice cloning.
            
        
        
    
    

    ← Sekabeh swara

VITS TTS

Love TTS.ai? Nyathet kanca-kancamu!

About VITS

Ing cetha

VITS suara

CSS10 (Dutch)

CSS10 (Finnish)

CSS10 (French)

CSS10 (German)

CSS10 (Hungarian)

CSS10 (Spanish)

Common Voice (Bulgarian)

Common Voice (Portuguese)

Default

MAI (Polish)

MAI (Ukrainian)

VITS TTS — FAQ

What does VITS stand for and how does it work?

Is VITS free for commercial use?

What languages does VITS support?