Spark TTS TTS
Voice cloning from five seconds of audio combined with prompt-based control over emotion, speed, and speaking style.
Spark TTS by SparkAudio merges voice cloning with controllable delivery in a single prompt-driven system. Using just five seconds of reference audio it clones a voice, then lets you steer emotion, speed, and speaking style while keeping that cloned identity intact. Under the hood it combines a BiCodec audio tokenizer, an LLM, and flow matching, and it supports English and Chinese. It is aimed at content creation where a single cloned voice needs to express a range of moods and pacing. Note the licensing split: Spark's code is Apache 2.0, but the model weights are released under CC BY-NC-SA 4.0, which restricts commercial use.
At a glance
- Developer
- SparkAudio
- License
- CC BY-NC-SA 4.0
- Tier
- standard
- Speed
- medium
- Voice cloning
- Yes
- Languages
- English, Chinese
- Max characters
- 1000
Spark TTS AI Voices
Best for
Content creation with cloned voices and emotional control