CosyVoice 2

CosyVoice 2 TTS

Alibaba Tongyi Lab's streaming TTS reaching human-parity naturalness with near-zero latency and zero-shot cloning.

CosyVoice 2, from Alibaba's Tongyi Lab, was designed to make high-quality speech viable in real time. It uses a finite scalar quantization approach combined with flow matching to support streaming synthesis at extremely low latency, while reaching human-comparable naturalness that outperforms many commercial systems in subjective tests. Beyond quality, it offers zero-shot voice cloning from about 3 seconds of audio, cross-lingual synthesis, and fine-grained emotion control. Covering 8 languages with a 1,000-character cap, it's a strong fit for voice assistants, streaming TTS, and other real-time applications.

A colpo d'occhio

Sviluppatore
Alibaba (Tongyi Lab)
Licenza
Apache 2.0
Livello
standard
Velocità
medium
Clonazione vocale
Lingue
English, Chinese, Japanese, Korean, French, German, Italian, Spanish
Caratteri massimi
1000

CosyVoice 2 voci

Chinese Female

Chinese
Standard Female
Uso

Chinese Male

Chinese
Standard Male
Uso

English Female

English
Standard Female
Uso

English Male

English
Standard Male
Uso

French Female

French
Standard Female
Uso

German Female

German
Standard Female
Uso

Italian Female

Italian
Standard Female
Uso

Japanese Female

Japanese
Standard Female
Uso

Korean Female

Korean
Standard Female
Uso

Spanish Female

Spanish
Standard Female
Uso

Meglio per

Real-time applications, streaming TTS, voice assistants

CosyVoice 2 FAQ del TTS

Yes. CosyVoice 2 uses finite scalar quantization for streaming synthesis at very low latency, which is what makes it suitable for voice assistants and real-time applications.

Yes. It offers zero-shot voice cloning from roughly 3 seconds of reference audio, plus cross-lingual synthesis and emotion control.

Yes. CosyVoice 2 is Apache 2.0 licensed. It supports 8 languages: English, Chinese, Japanese, Korean, French, German, Italian, and Spanish.
← Tutte le voci