CosyVoice3 TTS
Alibaba FunAudioLLM's latest multilingual model with ~150ms bi-streaming, instruction control, and zero-shot cloning.
CosyVoice3 is the newest generation from Alibaba's FunAudioLLM team and a clear step up from CosyVoice 2. It introduces bi-streaming inference with roughly 150ms latency and instruction-based control, letting you steer emotion, speed, and volume through prompts. Speaker similarity for zero-shot voice cloning is improved, and coverage spans 9 languages plus 18 Chinese dialects. An RL-tuned variant pushes prosody to a state-of-the-art level. With a 5,000-character ceiling, fast generation, and strong cloning, it's geared toward multilingual production TTS and real-time applications.
A colpo d'occhio
- Sviluppatore
- Alibaba (FunAudioLLM)
- Licenza
- Apache 2.0
- Livello
- standard
- Velocità
- fast
- Clonazione vocale
- Sì
- Lingue
- English, Chinese, Japanese, Korean, German, Spanish, French, Italian, Russian
- Caratteri massimi
- 5000
CosyVoice3 voci
Meglio per
Multilingual production TTS, real-time applications, voice cloning