CosyVoice 2 TTS
Alibaba Tongyi Lab's streaming TTS reaching human-parity naturalness with near-zero latency and zero-shot cloning.
CosyVoice 2, from Alibaba's Tongyi Lab, was designed to make high-quality speech viable in real time. It uses a finite scalar quantization approach combined with flow matching to support streaming synthesis at extremely low latency, while reaching human-comparable naturalness that outperforms many commercial systems in subjective tests. Beyond quality, it offers zero-shot voice cloning from about 3 seconds of audio, cross-lingual synthesis, and fine-grained emotion control. Covering 8 languages with a 1,000-character cap, it's a strong fit for voice assistants, streaming TTS, and other real-time applications.
A colpo d'occhio
- Sviluppatore
- Alibaba (Tongyi Lab)
- Licenza
- Apache 2.0
- Livello
- standard
- Velocità
- medium
- Clonazione vocale
- Sì
- Lingue
- English, Chinese, Japanese, Korean, French, German, Italian, Spanish
- Caratteri massimi
- 1000
CosyVoice 2 voci
Meglio per
Real-time applications, streaming TTS, voice assistants