CosyVoice 2

CosyVoice 2 TTS

Alibaba Tongyi Lab's streaming TTS reaching human-parity naturalness with near-zero latency and zero-shot cloning.

CosyVoice 2, from Alibaba's Tongyi Lab, was designed to make high-quality speech viable in real time. It uses a finite scalar quantization approach combined with flow matching to support streaming synthesis at extremely low latency, while reaching human-comparable naturalness that outperforms many commercial systems in subjective tests. Beyond quality, it offers zero-shot voice cloning from about 3 seconds of audio, cross-lingual synthesis, and fine-grained emotion control. Covering 8 languages with a 1,000-character cap, it's a strong fit for voice assistants, streaming TTS, and other real-time applications.

At a glance

Developer
Alibaba (Tongyi Lab)
License
Apache 2.0
Tier
standard
Speed
medium
Voice cloning
Yes
Languages
English, Chinese, Japanese, Korean, French, German, Italian, Spanish
Max characters
1000

CosyVoice 2 AI Voices

Chinese Female

Chinese
& Стандартӣ Female
Истифода

Chinese Male

Chinese
& Стандартӣ Male
Истифода

English Female

English
& Стандартӣ Female
Истифода

English Male

English
& Стандартӣ Male
Истифода

French Female

French
& Стандартӣ Female
Истифода

German Female

German
& Стандартӣ Female
Истифода

Italian Female

Italian
& Стандартӣ Female
Истифода

Japanese Female

Japanese
& Стандартӣ Female
Истифода

Korean Female

Korean
& Стандартӣ Female
Истифода

Spanish Female

Spanish
& Стандартӣ Female
Истифода

Best for

Real-time applications, streaming TTS, voice assistants

CosyVoice 2 TTS — FAQ

Yes. CosyVoice 2 uses finite scalar quantization for streaming synthesis at very low latency, which is what makes it suitable for voice assistants and real-time applications.

Yes. It offers zero-shot voice cloning from roughly 3 seconds of reference audio, plus cross-lingual synthesis and emotion control.

Yes. CosyVoice 2 is Apache 2.0 licensed. It supports 8 languages: English, Chinese, Japanese, Korean, French, German, Italian, and Spanish.
← All voices