Voice Library
Browse, preview, and compare 100+ AI voices across 20+ models. Find the perfect voice for your project.
200 voices found
No voices match your filters. Try adjusting your search criteria.
Voices by AI Model
Each TTS model has its own set of voices with unique characteristics. Some models support voice cloning, allowing you to use any voice as a reference.
Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.
Bark Small
15 Voices
Standard
Try Model
Lighter version of Bark with faster inference and lower memory usage.
Chatterbox
1 Voices
Premium
Try Model
State-of-the-art zero-shot voice cloning with emotion control from Resemble AI.
Default
English
Chatterbox Turbo
1 Voices
Standard
Try Model
Faster Chatterbox with sub-200ms latency and paralinguistic tags for laughs, coughs, and more.
Default
English
CosyVoice 2
10 Voices
Standard
Try Model
Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.
CosyVoice3
11 Voices
Standard
Try Model
Next-generation multilingual TTS with bi-streaming, emotion control, and zero-shot voice cloning.
Darwin TTS
4 Voices
Standard
Try Model
Cross-modal Qwen3-TTS variant with FFN weights blended from the Qwen3-1.7B language model for sharper multilingual cloning.
Multi-speaker dialog generation model that creates natural conversations between speakers.
GPT-SoVITS
4 Voices
Standard
Try Model
Few-shot voice cloning TTS that replicates any voice from just 5 seconds of audio.
IndexTTS-2
2 Voices
Standard
Try Model
Zero-shot TTS with fine-grained emotion control and high expressiveness.
Kani TTS 2
1 Voices
Standard
Try Model
Ultra-lightweight 400M English TTS model running in just 3GB VRAM.
Default
English
Kitten TTS
8 Voices
Free
Try Model
Ultra-lightweight TTS under 80MB. Runs on CPU without GPU.
Lightweight 82M parameter model delivering studio-quality speech with blazing-fast inference.
High-quality multilingual text-to-speech that runs on CPU with minimal latency.
Ming-Omni TTS
2 Voices
Free
Try Model
Compact 0.5B omni-modal speech model from inclusionAI with high-fidelity 44.1kHz output and zero-shot voice cloning.
MOSS-TTS Nano
11 Voices
Standard
Try Model
Tiny 100M MOSS-TTS variant — same architecture, 80x smaller, free-tier latency.
Multi-speaker dialogue continuation model — generate podcast-style conversations with up to 5 speakers and 60 minutes of coherent audio.
Instant voice cloning with granular control over style, emotion, and accent.
Human-level emotional TTS model trained on 100K hours of speech data.
LLM-based TTS that runs on CPU, GPU, or browser via llama.cpp and Transformers.js.
Female 1 (Neutral)
English
Parler TTS
1 Voices
Standard
Try Model
Describe the voice you want in natural language and Parler generates matching speech.
Default
EnglishA fast, local neural text to speech system optimized for Raspberry Pi and embedded devices.
Pocket TTS
8 Voices
Free
Try Model
Lightweight 100M parameter model by Kyutai with voice cloning from a single sample.
Alibaba's multilingual TTS with preset voices and voice design from text.
Sesame CSM
2 Voices
Premium
Try Model
Conversational speech model generating natural dialogue with appropriate timing and emotion.
Voice cloning TTS with controllable emotion and speaking style via prompts.
Default
English
StyleTTS 2
1 Voices
Premium
Try Model
Human-level text-to-speech through style diffusion and adversarial training.
Default
English
Tortoise TTS
1 Voices
Premium
Try Model
Multi-voice text-to-speech focused on quality with autoregressive architecture.
Random
EnglishMicrosoft's multi-speaker long-form TTS generating up to 90 minutes with 4 distinct speakers.
Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech.
Default
EnglishUnderstanding AI Voices
Voice Quality Tiers
TTS.ai offers voices across three quality tiers. Free-tier voices from Piper, VITS, and MeloTTS deliver fast, good-quality synthesis at no cost. Standard-tier voices from models like Kokoro and CosyVoice 2 offer more natural prosody and emotion. Premium-tier voices from OpenVoice, Chatterbox, and StyleTTS 2 provide the most realistic, human-like speech available in open-source TTS.
Multilingual Voices
Many voices support multiple languages. Some models like CosyVoice 2 and GPT-SoVITS support cross-lingual synthesis, where a voice trained in one language can speak naturally in another. The language filter above lets you find voices that natively support your target language, ensuring the best pronunciation and intonation.
Voice Cloning
Some models support voice cloning, which means you can use any voice as a reference to create speech that sounds like that person. Upload a short audio sample (10-30 seconds) and the model will adapt to match the voice characteristics. Models that support cloning include GPT-SoVITS, CosyVoice 2, and Chatterbox.
Choosing the Right Voice
The best voice depends on your use case. For audiobooks and podcasts, use premium voices with natural prosody. For game characters, explore diverse voices across models. For accessibility and screen readers, choose clear, well-paced voices. For quick prototyping, free-tier voices offer instant results with no character cost. Preview each voice with the play button before making your choice.
Frequently Asked Questions
Record, Enhance, and Transform Your Voice
Use the Voice Recorder with our full suite of AI audio tools. Clone your voice, transcribe speech, enhance quality, and more.