Darwin TTS

Darwin TTS TTS

A Qwen3-TTS variant whose talker FFN weights are blended from the Qwen3 language model for sharper cross-lingual cloning.

Darwin-TTS-1.7B-Cross by FINAL-Bench is a research variant of Qwen3-TTS-1.7B with an unusual construction: 84 of its talker-FFN tensors (about 8.6% of them) are blended at a 3% ratio with the matching tensors from Qwen3-1.7B-Base, all without any retraining. The result is a model that produces noticeably crisper cross-lingual voice cloning across Korean, English, Japanese, and Chinese — its four core languages. It operates in zero-shot voice-clone mode, needing only about three seconds of reference audio to capture a speaker. Darwin is best suited to transferring a single reference voice across those four languages, for example dubbing or multilingual narration with consistent speaker identity.

At a glance

Developer
FINAL-Bench
License
Apache 2.0
Tier
standard
Speed
medium
Voice cloning
Yes
Languages
English, Korean, Japanese, Chinese
Max characters
2000

Darwin TTS AI Voices

Default

English
Стандарт Neutral
куллану

Default (Chinese)

Chinese
Стандарт Neutral
куллану

Default (Japanese)

Japanese
Стандарт Neutral
куллану

Default (Korean)

Korean
Стандарт Neutral
куллану

Best for

Cross-lingual voice cloning between English / Korean / Japanese / Chinese with a single reference voice

Darwin TTS TTS — FAQ

Darwin starts from Qwen3-TTS-1.7B but blends a small fraction of its talker-FFN weights with the matching weights from the Qwen3-1.7B base language model. This training-free blend sharpens cross-lingual voice cloning rather than changing the base voices.

English, Korean, Japanese, and Chinese. The FINAL-Bench release specifically markets its cross-lingual blend for those four, and the deployed model ships voices for them.

About three seconds. It works in zero-shot mode, so no fine-tuning or training is required — you provide a short reference clip and it generates new speech in that voice.
← All voices