דיווח על בקשת באג / תכונה

VITS TTS

The end-to-end TTS architecture that combines a variational autoencoder, normalizing flows, and adversarial training.

0/500 תווים · נרשמים ל-5,000 דולר לדור. →

תחתמי. עבור 5,000 מגבלה של תווים

מצב SSML (Synthesis Speech for fine control)

לעטוף את הטקסט שלך בתגי SSML לשליטה מדויקת:

<speak><prosody rate="slow">Slow speech</prosody></speak>

רגש / תוויות סגנון

שם התוויות של המודל הנבחר הוא □ לחץ כדי להפיל אחד לתוך הטקסט שלך שבו הוא קורה:

מילון הגייה

הגדר הגייה מותאמת אישית (מילה = הגייה):

הגשה 0

-12 +12

AI Model

קול

שפה

תבנית פלט

מהירות 1.0x

0.5x 2.0x

חינם עם פייפר, VITS, Melotts

הקול שנוצר יופיע כאן. בחר דגם, הזן טקסט, ולחץ על יצירתו.

אודות VITS

VITS — Variational Inference with adversarial learning for end-to-end Text-to-Speech — was introduced by Jaehyeon Kim and collaborators in 2021 and became a foundational architecture for modern neural speech. Rather than the older two-stage pipeline, it synthesizes audio in a single parallel end-to-end pass, pairing a variational autoencoder with normalizing flows and a GAN-style adversarial training process to lift naturalness. At about 25M parameters and trained on ~585 hours, it produces natural prosody at fast inference speeds and supports multiple speakers. It serves as a solid general-purpose, free baseline and underpins many later models such as Piper and MeloTTS.

הטוב ביותר עבור: General-purpose text-to-speech with natural prosody

עיין בכל VITS קולות

במבט חטוף

מפתח: Jaehyeon Kim et al.
רישיון: MIT
Tier: free
מהירות: fast
שיבוט קולי: לא.
שפות: English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, Polish
תווים מרביים: 2000

VITS קולות

CSS10 (Dutch)

Dutch

חופשי Neutral

CSS10 (Finnish)

Finnish

חופשי Neutral

CSS10 (French)

French

חופשי Neutral

CSS10 (German)

German

חופשי Neutral

CSS10 (Hungarian)

Hungarian

חופשי Neutral

CSS10 (Spanish)

Spanish

חופשי Neutral

Common Voice (Bulgarian)

Bulgarian

חופשי Neutral

Common Voice (Portuguese)

Portuguese

חופשי Neutral

Default

English

חופשי Neutral

MAI (Polish)

Polish

חופשי Female

MAI (Ukrainian)

Ukrainian

חופשי Neutral

VITS TTS □ FAQ

VITS means Variational Inference with adversarial learning for end-to-end Text-to-Speech. It generates audio in a single parallel pass using a variational autoencoder, normalizing flows, and adversarial (GAN) training, rather than a two-stage pipeline.

Yes. VITS is MIT-licensed and in the free tier, so it can be used commercially.

On TTS.ai, VITS covers 11 languages including English, German, Spanish, French, Portuguese, Dutch, Finnish, Hungarian, Bulgarian, Japanese, and Polish, with multi-speaker support. It does not do voice cloning.

← כל הקולות

VITS TTS

אוהב את ט.ט.ס.אי?

אודות VITS

במבט חטוף

VITS קולות

CSS10 (Dutch)

CSS10 (Finnish)

CSS10 (French)

CSS10 (German)

CSS10 (Hungarian)

CSS10 (Spanish)

Common Voice (Bulgarian)

Common Voice (Portuguese)

Default

MAI (Polish)

MAI (Ukrainian)

VITS TTS □ FAQ

What does VITS stand for and how does it work?

Is VITS free for commercial use?

What languages does VITS support?