VoxCPM TTS
A tokenizer-free TTS model that works in continuous space, outputs 44.1kHz audio, and stays consistent across paragraphs.
VoxCPM 1.5 by OpenBMB takes an unusual approach: instead of converting speech into discrete tokens, it operates directly in continuous space, which helps it preserve fine acoustic detail. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from three to ten seconds of reference, and maintains a consistent voice across long passages — a common failure point for other models on multi-paragraph text. Its cross-language cloning lets an English reference voice speak Chinese and vice versa. With Apache 2.0 licensing and LoRA fine-tuning support, it is well suited to audiobooks and long-form content where voice consistency over many paragraphs is essential.
At a glance
- Developer
- OpenBMB
- License
- Apache 2.0
- Tier
- standard
- Speed
- fast
- Voice cloning
- Yes
- Languages
- English, Chinese
- Max characters
- 2000
VoxCPM AI Voices
Best for
High-fidelity audio, audiobooks, long-form content with voice consistency