VoxCPM TTS

A tokenizer-free TTS model that works in continuous space, outputs 44.1kHz audio, and stays consistent across paragraphs.

Teks
Lêers

0/500 karakters · Teken 5 000 per geslag aan →

Teken op vir 5 000 karakterbeperking

SSML Modus (Speke sintesis Markup Taal vir goeie beheer)

Oorvloei jou teks in SSML etiket vir presiese beheer:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emosie / Styl etiket

Merk die gekose model verstaan ooit die woord ooit om een in jou teks te laat val waar dit gebeur:

Woordeboeke

Definieer pasmaak uitspraak (woord = uitspraak):

Pitch 0

-12 +12

Kunsmatige inteligensie Model

Stem

Taal

Uitset Formaat

Spoed 1.0x

0.5x 2.0x

Vry met Pyper, VITS, MiloTTS

Jou gegenereer oudio sal hier verskyn. Kies 'n model, invoer teks, en kliek Genereer.

Aangaande VoxCPM

VoxCPM 1.5 by OpenBMB takes an unusual approach: instead of converting speech into discrete tokens, it operates directly in continuous space, which helps it preserve fine acoustic detail. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from three to ten seconds of reference, and maintains a consistent voice across long passages — a common failure point for other models on multi-paragraph text. Its cross-language cloning lets an English reference voice speak Chinese and vice versa. With Apache 2.0 licensing and LoRA fine-tuning support, it is well suited to audiobooks and long-form content where voice consistency over many paragraphs is essential.

Beste vir: High-fidelity audio, audiobooks, long-form content with voice consistency

Blaai deur almal VoxCPM stemme

Met'n blik

Ontwikkelingvloeistof is minDeveloper: OpenBMB
Lisensie: Apache 2.0
Tier: standard
Spoed: fast
Stem kloning: Ja
Tale: English, Chinese
Voeg- agteraan- by Taal: 2000

VoxCPM stemme

Default

English

Kalender Neutral

Default Chinese

Chinese

Kalender Neutral

VoxCPM TTS ← FAQ

Rather than discretizing speech into tokens, VoxCPM models audio in continuous space using flow matching. This helps it retain subtle acoustic detail and produce clean 44.1kHz output.

Yes. It is specifically designed to keep the voice consistent across paragraphs, which makes it well suited to audiobooks and other long passages where other models tend to drift.

Yes. It supports cross-lingual cloning between English and Chinese — for example applying an English reference voice to Chinese speech — from three to ten seconds of audio.

← Alle stemme

VoxCPM TTS

Liefde TTS.ai, vertel jou vriende!

Aangaande VoxCPM

Met'n blik

VoxCPM stemme

Default

Default Chinese

VoxCPM TTS ← FAQ

What does "tokenizer-free" mean for VoxCPM?

Is VoxCPM good for long-form content?

Can VoxCPM clone voices across languages?