GPT-SoVITS

GPT-SoVITS TTS

A few-shot voice cloning model that replicates a voice — and can even sing — from as little as five seconds of audio.

GPT-SoVITS, created by the developer known as RVC-Boss, combines GPT-style language modeling with SoVITS (Singing Voice Conversion / synthesis) to deliver some of the most accessible voice cloning in open source. With as little as five seconds of reference audio it captures a speaker's timbre and style, and it stands out from most TTS models in handling singing as well as speech. It works across English, Chinese, Japanese, and Korean and supports cross-lingual generation, so a cloned voice can speak a language the reference clip never used. It is widely used by content creators for voice replication, dubbing, and song covers, and reaches high fidelity for a model of its size.

At a glance

Developer
RVC-Boss
License
MIT
Tier
standard
Speed
slow
Voice cloning
Yes
Languages
English, Chinese, Japanese, Korean
Max characters
500

GPT-SoVITS AI Voices

Default

Chinese
Dìfọ́ọ̀ltụ̀ Neutral
Jiri

English Default

English
Dìfọ́ọ̀ltụ̀ Neutral
Jiri

Japanese Default

Japanese
Dìfọ́ọ̀ltụ̀ Neutral
Jiri

Korean Default

Korean
Dìfọ́ọ̀ltụ̀ Neutral
Jiri

Best for

Voice cloning, singing synthesis, content creator voice replication

GPT-SoVITS TTS — FAQ

As little as five seconds. It uses few-shot learning, so a short reference clip is enough to capture a speaker, though a cleaner and slightly longer sample improves similarity.

Yes. Its SoVITS lineage comes from singing voice synthesis, so unlike most TTS models it can generate singing as well as spoken voice, which is why it is popular for song covers.

English, Chinese, Japanese, and Korean, with cross-lingual synthesis — a voice cloned from one language can be made to speak the others.
← All voices