Darwin TTS

Darwin TTS TTS

A Qwen3-TTS variant whose talker FFN weights are blended from the Qwen3 language model for sharper cross-lingual cloning.

Darwin-TTS-1.7B-Cross by FINAL-Bench is a research variant of Qwen3-TTS-1.7B with an unusual construction: 84 of its talker-FFN tensors (about 8.6% of them) are blended at a 3% ratio with the matching tensors from Qwen3-1.7B-Base, all without any retraining. The result is a model that produces noticeably crisper cross-lingual voice cloning across Korean, English, Japanese, and Chinese — its four core languages. It operates in zero-shot voice-clone mode, needing only about three seconds of reference audio to capture a speaker. Darwin is best suited to transferring a single reference voice across those four languages, for example dubbing or multilingual narration with consistent speaker identity.

At a glance

Developer
FINAL-Bench
License
Apache 2.0
Tier
standard
Speed
medium
Voice cloning
Yes
Languages
English, Korean, Japanese, Chinese
Max characters
2000

Darwin TTS AI Voices

Default

English
& Стандартӣ Neutral
Истифода

Default (Chinese)

Chinese
& Стандартӣ Neutral
Истифода

Default (Japanese)

Japanese
& Стандартӣ Neutral
Истифода

Default (Korean)

Korean
& Стандартӣ Neutral
Истифода

Best for

Cross-lingual voice cloning between English / Korean / Japanese / Chinese with a single reference voice

Darwin TTS TTS — FAQ

Darwin starts from Qwen3-TTS-1.7B but blends a small fraction of its talker-FFN weights with the matching weights from the Qwen3-1.7B base language model. This training-free blend sharpens cross-lingual voice cloning rather than changing the base voices.

English, Korean, Japanese, and Chinese. The FINAL-Bench release specifically markets its cross-lingual blend for those four, and the deployed model ships voices for them.

About three seconds. It works in zero-shot mode, so no fine-tuning or training is required — you provide a short reference clip and it generates new speech in that voice.
← All voices