CosyVoice3

CosyVoice3 TTS

Alibaba FunAudioLLM's latest multilingual model with ~150ms bi-streaming, instruction control, and zero-shot cloning.

CosyVoice3 is the newest generation from Alibaba's FunAudioLLM team and a clear step up from CosyVoice 2. It introduces bi-streaming inference with roughly 150ms latency and instruction-based control, letting you steer emotion, speed, and volume through prompts. Speaker similarity for zero-shot voice cloning is improved, and coverage spans 9 languages plus 18 Chinese dialects. An RL-tuned variant pushes prosody to a state-of-the-art level. With a 5,000-character ceiling, fast generation, and strong cloning, it's geared toward multilingual production TTS and real-time applications.

At a glance

Developer
Alibaba (FunAudioLLM)
License
Apache 2.0
Tier
standard
Speed
fast
Voice cloning
Yes
Languages
English, Chinese, Japanese, Korean, German, Spanish, French, Italian, Russian
Max characters
5000

CosyVoice3 AI Voices

Chinese Female

Chinese
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

Chinese Male

Chinese
ਸਟੈਂਡਰਡ Male
ਵਰਤੋਂ

English Female

English
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

English Male

English
ਸਟੈਂਡਰਡ Male
ਵਰਤੋਂ

French Female

French
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

German Female

German
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

Italian Female

Italian
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

Japanese Female

Japanese
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

Korean Female

Korean
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

Russian Female

Russian
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

Spanish Female

Spanish
ਸਟੈਂਡਰਡ Female
ਵਰਤੋਂ

Best for

Multilingual production TTS, real-time applications, voice cloning

CosyVoice3 TTS — FAQ

CosyVoice3 adds bi-streaming inference at around 150ms latency, instruction-based control over emotion/speed/volume, improved speaker similarity for cloning, and coverage of 9 languages plus 18 Chinese dialects, with an RL-tuned variant for state-of-the-art prosody.

Yes. It supports zero-shot voice cloning from a reference clip (around 3 seconds minimum) with improved speaker similarity over the previous generation.

Yes. CosyVoice3 is licensed under Apache 2.0, permitting commercial use.
← All voices