VoxCPM TTS
A tokenizer-free TTS model that works in continuous space, outputs 44.1kHz audio, and stays consistent across paragraphs.
VoxCPM 1.5 by OpenBMB takes an unusual approach: instead of converting speech into discrete tokens, it operates directly in continuous space, which helps it preserve fine acoustic detail. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from three to ten seconds of reference, and maintains a consistent voice across long passages — a common failure point for other models on multi-paragraph text. Its cross-language cloning lets an English reference voice speak Chinese and vice versa. With Apache 2.0 licensing and LoRA fine-tuning support, it is well suited to audiobooks and long-form content where voice consistency over many paragraphs is essential.
A colpo d'occhio
- Sviluppatore
- OpenBMB
- Licenza
- Apache 2.0
- Livello
- standard
- Velocità
- fast
- Clonazione vocale
- Sì
- Lingue
- English, Chinese
- Caratteri massimi
- 2000
VoxCPM voci
Meglio per
High-fidelity audio, audiobooks, long-form content with voice consistency