VoxCPM TTS

A tokenizer-free TTS model that works in continuous space, outputs 44.1kHz audio, and stays consistent across paragraphs.

Testo
File

0/500 caratteri · Iscriviti per 5.000 per generazione →

Iscriviti per un limite di 5.000 caratteri

Modalità SSML (Linguaggio di marcatura sintesi vocale per un controllo fine)

Avvolgi il tuo testo nei tag SSML per un controllo preciso:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emozione / Tag stile

Tags il modello selezionato comprende clic su

Dizionario della pronuncia

Definire le pronunciazioni personalizzate (parola = pronuncia):

Piazzola 0

-12 +12

Modello AI

Voce

Lingua

Formato di output

Velocità 1.0x

0.5x 2.0x

Gratis con Piper, VITS, MeloTTS

L'audio generato apparirà qui. Scegli un modello, inserisci testo e fai clic su Genera.

Informazioni VoxCPM

VoxCPM 1.5 by OpenBMB takes an unusual approach: instead of converting speech into discrete tokens, it operates directly in continuous space, which helps it preserve fine acoustic detail. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from three to ten seconds of reference, and maintains a consistent voice across long passages — a common failure point for other models on multi-paragraph text. Its cross-language cloning lets an English reference voice speak Chinese and vice versa. With Apache 2.0 licensing and LoRA fine-tuning support, it is well suited to audiobooks and long-form content where voice consistency over many paragraphs is essential.

Meglio per: High-fidelity audio, audiobooks, long-form content with voice consistency

Sfoglia tutti VoxCPM voci

A colpo d'occhio

Sviluppatore: OpenBMB
Licenza: Apache 2.0
Livello: standard
Velocità: fast
Clonazione vocale: Sì
Lingue: English, Chinese
Caratteri massimi: 2000

VoxCPM voci

Default

English

Standard Neutral

Default Chinese

Chinese

Standard Neutral

VoxCPM FAQ del TTS

Rather than discretizing speech into tokens, VoxCPM models audio in continuous space using flow matching. This helps it retain subtle acoustic detail and produce clean 44.1kHz output.

Yes. It is specifically designed to keep the voice consistent across paragraphs, which makes it well suited to audiobooks and other long passages where other models tend to drift.

Yes. It supports cross-lingual cloning between English and Chinese — for example applying an English reference voice to Chinese speech — from three to ten seconds of audio.

← Tutte le voci

VoxCPM TTS

Ti piace TTS.ai? Dillo ai tuoi amici!

Informazioni VoxCPM

A colpo d'occhio

VoxCPM voci

Default

Default Chinese

VoxCPM FAQ del TTS

What does "tokenizer-free" mean for VoxCPM?

Is VoxCPM good for long-form content?

Can VoxCPM clone voices across languages?