Spark TTS

Spark TTS TTS

Voice cloning from five seconds of audio combined with prompt-based control over emotion, speed, and speaking style.

Spark TTS by SparkAudio merges voice cloning with controllable delivery in a single prompt-driven system. Using just five seconds of reference audio it clones a voice, then lets you steer emotion, speed, and speaking style while keeping that cloned identity intact. Under the hood it combines a BiCodec audio tokenizer, an LLM, and flow matching, and it supports English and Chinese. It is aimed at content creation where a single cloned voice needs to express a range of moods and pacing. Note the licensing split: Spark's code is Apache 2.0, but the model weights are released under CC BY-NC-SA 4.0, which restricts commercial use.

At a glance

Developer
SparkAudio
License
CC BY-NC-SA 4.0
Tier
standard
Speed
medium
Voice cloning
Yes
Languages
English, Chinese
Max characters
1000

Spark TTS AI Voices

Chinese Default

Chinese
Chimiro Neutral
Use

Default

English
Chimiro Neutral
Use

Best for

Content creation with cloned voices and emotional control

Spark TTS TTS — FAQ

It uses a prompt-based control system layered on top of voice cloning, so you can adjust emotion, speed, and speaking style while preserving the identity of the cloned voice.

About five seconds of reference audio is enough to clone a voice in English or Chinese.

Its model weights are licensed CC BY-NC-SA 4.0, which prohibits commercial use, even though the project code is Apache 2.0. Choose a permissively-licensed model for commercial work.
← All voices