Prijavi grešku / Zahtjev za značajkom

VITS TTS

The end-to-end TTS architecture that combines a variational autoencoder, normalizing flows, and adversarial training.

0/500 znakovi · Prijavite se za 5.000 po generaciji →

Prijavite se za ograničenje od 5.000 znakova

SSML Mode (Jezik za označavanje sinteze govora za preciznu kontrolu)

Omotajte tekst u SSML oznake za preciznu kontrolu:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emocije / Stil oznake

Oznake koje odabrani model razumije — kliknite da biste ih ubacili u tekst gdje se pojavljuju:

Rječnik izgovora

Definirajte vlastite izgovore (riječ = izgovor):

Stupnjevi 0

-12 +12

Dia Dialog Format: Koristite [S1] i [S2] oznake za označavanje različitih govornika. Primjer:

[S1] Zdravo! [S2] Zdravo, kako ste?



                

                
                
                    
                    
                        Model
                        
                    

                    
                    
                        
                            Glas
                            
                        
                        
                            
                            
                                
                                
                                
                            
                            
                        
                    
                
                

                
                
                    
                    
                        Jezik
                        
                    

                    
                    
                        Izlazni format
                        
                    

                    
                    
                        
                            Brzina
                            1.0x
                        
                        
                        
                            0.5x
                            2.0x
                        
                    
                

                
                
                    
                    
                        
                        Besplatno sa Piper, VITS, MeloTTS



        
        
            
                Ovdje će se pojaviti vaš generirani audio. Izaberite model, unesite tekst i kliknite na Generiraj.
            
            
            
                
                
                    Generiranje nije uspjelo
                    
                
            
        

            
                
                    
                        
                            Audio uspješno generisan
                            
                        
                        






    
        
            
                
                
                
                0:00
                
                    
                    
                        
                    
                
                
                    
                
                
            
        
    



                        
                            
                                Preuzmi audio
                            
                            
                                Preuzmi.srt
                            
                            
                            
                            Link istječe za 24h
                            
                                
                                
                                
                                
                                
                            
                        
                        
                        
                            Free tier: personal use. Komercijalna licenca od $5/mjesečno
                        
                        
                    
                
            
        

        
        
            
                
                    Ponestaje slobodnih znakova
                    Dobijte 200K znakova svaki mjesec - $5/mo
                    ili jednokratni paket od 100.000 za 5 dolara.
                
            
            
                
                    Napravi svoj glas
                    Kloniraj glas za 30 sekundi.
                    
                
            
        

        

    
        
            
                
                    Volite TTS.ai?





    
        
            
                ✨ Premium Voice Model
                
            
            
                Ovo je premium model glasa, dostupan na bilo kojem plaćenom planu. Još uvijek možete besplatno pregledati njegove glasove pomoću gumba za reprodukciju pored birača glasa.
                
                    Otključaj premium glasove - $5/mjesečno
                    Usporedi planove
                
            
        
    





    
        
            
                
                
                    Kupi još znakova
                    
    Nema oglasa
    Neograničena upotreba
    Prioritetna podrška
    Rani pristup novim mogućnostima


                
                

                
                    
                        Dobiti više znakova






    
    
        
            O meni VITS
            VITS — Variational Inference with adversarial learning for end-to-end Text-to-Speech — was introduced by Jaehyeon Kim and collaborators in 2021 and became a foundational architecture for modern neural speech. Rather than the older two-stage pipeline, it synthesizes audio in a single parallel end-to-end pass, pairing a variational autoencoder with normalizing flows and a GAN-style adversarial training process to lift naturalness. At about 25M parameters and trained on ~585 hours, it produces natural prosody at fast inference speeds and supports multiple speakers. It serves as a solid general-purpose, free baseline and underpins many later models such as Piper and MeloTTS.
            
            Najbolje za: General-purpose text-to-speech with natural prosody
            
            Pregledaj sve VITS glasovi
        
        
            
                
                    Na prvi pogled
                    
                        Programer
Jaehyeon Kim et al.
                        Licenca
MIT
                        Životinje
free
                        Brzina
fast
                        Kloniranje glasa
Ne, ne, ne.
                        Jezici
English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, Polish
                        Maksimalan broj znakova
2000
                    
                
            
        
    

    
    
    VITS glasovi
    
        
        
            
                
                    
                        
                            CSS10 (Dutch)
                            Dutch
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (Finnish)
                            Finnish
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (French)
                            French
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (German)
                            German
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (Hungarian)
                            Hungarian
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            CSS10 (Spanish)
                            Spanish
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            Common Voice (Bulgarian)
                            Bulgarian
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            Common Voice (Portuguese)
                            Portuguese
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            Default
                            English
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            MAI (Polish)
                            Polish
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Female
                    
                    
                    
                    
                
            
        
        
        
            
                
                    
                        
                            MAI (Ukrainian)
                            Ukrainian
                        
                        
                        
                        
                    
                    
                        Slobodan
                        
                        Neutral
                    
                    
                    
                    
                
            
        
        
    
    

    
    
    VITS FAQ
    
        
        
            
                
            
            
                VITS means Variational Inference with adversarial learning for end-to-end Text-to-Speech. It generates audio in a single parallel pass using a variational autoencoder, normalizing flows, and adversarial (GAN) training, rather than a two-stage pipeline.
            
        
        
        
            
                
            
            
                Yes. VITS is MIT-licensed and in the free tier, so it can be used commercially.
            
        
        
        
            
                
            
            
                On TTS.ai, VITS covers 11 languages including English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, and Polish, with multi-speaker support. It does not do voice cloning.
            
        
        
    
    

    ← Svi glasovi

VITS TTS

Volite TTS.ai?

O meni VITS

Na prvi pogled

VITS glasovi

CSS10 (Dutch)

CSS10 (Finnish)

CSS10 (French)

CSS10 (German)

CSS10 (Hungarian)

CSS10 (Spanish)

Common Voice (Bulgarian)

Common Voice (Portuguese)

Default

MAI (Polish)

MAI (Ukrainian)

VITS FAQ

What does VITS stand for and how does it work?

Is VITS free for commercial use?

What languages does VITS support?