Segnala bug / richiesta di funzionalità

VITS TTS

The end-to-end TTS architecture that combines a variational autoencoder, normalizing flows, and adversarial training.

Testo
File

0/500 caratteri · Iscriviti per 5.000 per generazione →

Iscriviti per un limite di 5.000 caratteri

Modalità SSML (Linguaggio di marcatura sintesi vocale per un controllo fine)

Avvolgi il tuo testo nei tag SSML per un controllo preciso:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emozione / Tag stile

Tags il modello selezionato comprende clic su

Dizionario della pronuncia

Definire le pronunciazioni personalizzate (parola = pronuncia):

Piazzola 0

-12 +12

Modello AI

Voce

Lingua

Formato di output

Velocità 1.0x

0.5x 2.0x

Gratis con Piper, VITS, MeloTTS

L'audio generato apparirà qui. Scegli un modello, inserisci testo e fai clic su Genera.

Informazioni VITS

VITS — Variational Inference with adversarial learning for end-to-end Text-to-Speech — was introduced by Jaehyeon Kim and collaborators in 2021 and became a foundational architecture for modern neural speech. Rather than the older two-stage pipeline, it synthesizes audio in a single parallel end-to-end pass, pairing a variational autoencoder with normalizing flows and a GAN-style adversarial training process to lift naturalness. At about 25M parameters and trained on ~585 hours, it produces natural prosody at fast inference speeds and supports multiple speakers. It serves as a solid general-purpose, free baseline and underpins many later models such as Piper and MeloTTS.

Meglio per: General-purpose text-to-speech with natural prosody

Sfoglia tutti VITS voci

A colpo d'occhio

Sviluppatore: Jaehyeon Kim et al.
Licenza: MIT
Livello: free
Velocità: fast
Clonazione vocale: No.
Lingue: English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, Polish
Caratteri massimi: 2000

VITS voci

CSS10 (Dutch)

Dutch

Libero Neutral

CSS10 (Finnish)

Finnish

Libero Neutral

CSS10 (French)

French

Libero Neutral

CSS10 (German)

German

Libero Neutral

CSS10 (Hungarian)

Hungarian

Libero Neutral

CSS10 (Spanish)

Spanish

Libero Neutral

Common Voice (Bulgarian)

Bulgarian

Libero Neutral

Common Voice (Portuguese)

Portuguese

Libero Neutral

Default

English

Libero Neutral

MAI (Polish)

Polish

Libero Female

MAI (Ukrainian)

Ukrainian

Libero Neutral

VITS FAQ del TTS

VITS means Variational Inference with adversarial learning for end-to-end Text-to-Speech. It generates audio in a single parallel pass using a variational autoencoder, normalizing flows, and adversarial (GAN) training, rather than a two-stage pipeline.

Yes. VITS is MIT-licensed and in the free tier, so it can be used commercially.

On TTS.ai, VITS covers 11 languages including English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, and Polish, with multi-speaker support. It does not do voice cloning.

← Tutte le voci

VITS TTS

Ti piace TTS.ai? Dillo ai tuoi amici!

Informazioni VITS

A colpo d'occhio

VITS voci

CSS10 (Dutch)

CSS10 (Finnish)

CSS10 (French)

CSS10 (German)

CSS10 (Hungarian)

CSS10 (Spanish)

Common Voice (Bulgarian)

Common Voice (Portuguese)

Default

MAI (Polish)

MAI (Ukrainian)

VITS FAQ del TTS

What does VITS stand for and how does it work?

Is VITS free for commercial use?

What languages does VITS support?