CosyVoice3 TTS
Alibaba FunAudioLLM's latest multilingual model with ~150ms bi-streaming, instruction control, and zero-shot cloning.
CosyVoice3 is the newest generation from Alibaba's FunAudioLLM team and a clear step up from CosyVoice 2. It introduces bi-streaming inference with roughly 150ms latency and instruction-based control, letting you steer emotion, speed, and volume through prompts. Speaker similarity for zero-shot voice cloning is improved, and coverage spans 9 languages plus 18 Chinese dialects. An RL-tuned variant pushes prosody to a state-of-the-art level. With a 5,000-character ceiling, fast generation, and strong cloning, it's geared toward multilingual production TTS and real-time applications.
At a glance
- Developer
- Alibaba (FunAudioLLM)
- License
- Apache 2.0
- Tier
- standard
- Speed
- fast
- Voice cloning
- Yes
- Languages
- English, Chinese, Japanese, Korean, German, Spanish, French, Italian, Russian
- Max characters
- 5000
CosyVoice3 AI Voices
Best for
Multilingual production TTS, real-time applications, voice cloning