VieNeu-TTS-v2

VieNeu-TTS-v2 TTS

A Vietnamese-first, CPU-only model with en-vi code-switching, 7 regional preset voices, and zero-shot cloning.

VieNeu-TTS-v2 is a 300M-parameter Vietnamese-first model built on a Qwen3 backbone and trained on more than 10,000 hours of bilingual data. It handles seamless English-Vietnamese code-switching, ships 7 preset voices spanning Northern and Southern accents, and clones a voice instantly from just 3-5 seconds of reference audio. Notably it runs entirely on CPU — using GGUF Q4 inference plus an ONNX audio decoder — with no GPU required, finishing a generation in about 7 seconds. It's purpose-built for Vietnamese content and bilingual en-vi narration, an underserved niche in open TTS.

At a glance

Developer
Phạm Nguyễn Ngọc Bảo
License
Apache 2.0
Tier
standard
Speed
fast
Voice cloning
Yes
Languages
Vietnamese, English
Max characters
1000

VieNeu-TTS-v2 AI Voices

Bích Ngọc (North, Female)

Vietnamese
_Öň bellenen Female
Ullan

Phạm Tuyên (North, Male)

Vietnamese
_Öň bellenen Male
Ullan

Thanh Bình (North, Male)

Vietnamese
_Öň bellenen Male
Ullan

Thái Sơn (South, Male)

Vietnamese
_Öň bellenen Male
Ullan

Thục Đoan (South, Female)

Vietnamese
_Öň bellenen Female
Ullan

Trúc Ly (North, Female)

Vietnamese
_Öň bellenen Female
Ullan

Xuân Vĩnh (South, Male)

Vietnamese
_Öň bellenen Male
Ullan

Best for

Vietnamese content and bilingual en-vi narration

VieNeu-TTS-v2 TTS — FAQ

Yes. VieNeu-TTS-v2 runs entirely on CPU via GGUF Q4 inference and an ONNX audio decoder — no GPU needed — and completes a generation in around 7 seconds.

It is Vietnamese-first with English support and seamless en-vi code-switching. It ships 7 preset voices spanning Northern and Southern Vietnamese accents.

Yes. It supports instant zero-shot voice cloning from just 3-5 seconds of reference audio. It is Apache 2.0 licensed and free to use commercially.
← All voices