Free AI Text to Speech
32+ open-source models, 266+ voices, 33+ languages. No account required.
Everything You Need for Voice AI
30+ tools powered by open-source AI models
32+ AI Voice Models
The most comprehensive collection of open-source TTS models in one platform
Kokoro Free
Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.
Best for: High-quality TTS with minimal latency, streaming applications
Try Free
Piper Free
Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.
Best for: Quick previews, accessibility, and embedded applications
Try Free
VITS Free
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.
Best for: General-purpose text-to-speech with natural prosody
Try Free
MeloTTS Free
MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.
Best for: Production applications needing fast, multilingual TTS
Try Free
Kani TTS 2 Free
Kani-TTS-2 by NineNineSix is an ultra-lightweight 400M parameter model built on a Liquid AI LFM2 backbone with NVIDIA NanoCodec. It runs in just 3GB VRAM and produces ~10 seconds of speech in ~2 seconds on an A100 (RTF 0.2). The current public release ships an English-only `kani-tts-2-en` checkpoint and does not expose the speaker-embedding hook needed for voice cloning — use Chatterbox / IndexTTS2 / F5-TTS for cloning, or Kokoro / MeloTTS for non-English.
Best for: Fast English generation on low-VRAM hardware, quick previews
Try Free
OuteTTS Free
OuteTTS extends large language models with text-to-speech capabilities while preserving the original architecture. It supports multiple backends including llama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, and even browser inference via Transformers.js. Features zero-shot voice cloning through speaker profiles saved as JSON.
Best for: Edge deployment, browser-based TTS, low-resource environments
Try Free
Pocket TTS Free
Pocket TTS by Kyutai (creators of Moshi) is a compact 100M parameter text-to-speech model that punches well above its weight. It runs efficiently on CPU, supports zero-shot voice cloning from a single audio sample, and produces natural-sounding speech. The small model size makes it ideal for edge deployment and low-resource environments.
Best for: Lightweight deployment, CPU-only environments, quick voice cloning
Try Free
Kitten TTS Free
Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.
Best for: Fast lightweight TTS, edge deployment, low-latency applications
Try Free
Ming-Omni TTS Free
Ming-omni-tts-0.5B by inclusionAI is a compact omni-modal speech model built on the BailingMM dense backbone with a Patch-by-Patch flow-matching audio decoder. Delivers 44.1kHz output (near CD quality), supports zero-shot voice cloning from a 3+ second reference, and includes built-in emotion / dialect / BGM control via JSON instructions. Excellent stability — 0.83% WER on Chinese benchmarks.
Best for: High-fidelity bilingual narration, emotion-controlled voice acting, Chinese audiobook content
Try Free
MOSS-TTS Nano Free
MOSS-TTS-Nano-100M is OpenMOSS's compact 100M-parameter variant of the MOSS-TTS family, sharing the delay-transformer architecture. Trades the 8B model's peak quality for ~80x smaller weights and dramatically lower per-request VRAM, making it suitable for free-tier and high-throughput deployments. Same 20-language reach.
Best for: Free-tier TTS, high-volume production, low-latency interactive use
Try Free
Bark Standard
Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.
Developer: Suno · License: MIT
Try it
Bark Small Standard
Lighter version of Bark with faster inference and lower memory usage.
Developer: Suno · License: MIT
Try it
CosyVoice 2 Standard
Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.
Developer: Alibaba (Tongyi Lab) · License: Apache 2.0
Try it
Dia TTS Standard
Multi-speaker dialog generation model that creates natural conversations between speakers.
Developer: Nari Labs · License: Apache 2.0
Try it
Parler TTS Standard
Describe the voice you want in natural language and Parler generates matching speech.
Developer: Hugging Face · License: Apache 2.0
Try it
IndexTTS-2 Standard
Zero-shot TTS with fine-grained emotion control and high expressiveness.
Developer: Index Team · License: Bilibili Model License
Try it
Spark TTS Standard
Voice cloning TTS with controllable emotion and speaking style via prompts.
Developer: SparkAudio · License: CC BY-NC-SA 4.0
Try it
GPT-SoVITS Standard
Few-shot voice cloning TTS that replicates any voice from just 5 seconds of audio.
Developer: RVC-Boss · License: MIT
Try it
Orpheus Standard
Human-level emotional TTS model trained on 100K hours of speech data.
Developer: Canopy Labs · License: Llama 3.2 Community
Try it
Qwen3 TTS Standard
Alibaba's multilingual TTS with voice cloning, preset voices, and voice design from text.
Developer: Alibaba (Qwen) · License: Apache 2.0
Try it
Chatterbox Turbo Standard
Faster Chatterbox with sub-200ms latency and paralinguistic tags for laughs, coughs, and more.
Developer: Resemble AI · License: MIT
Try it
VoxCPM Standard
Tokenizer-free TTS producing 44.1kHz audio with context-aware paragraph consistency.
Developer: OpenBMB · License: Apache 2.0
Try it
VibeVoice Standard
Microsoft's multi-speaker long-form TTS generating up to 90 minutes with 4 distinct speakers.
Developer: Microsoft · License: MIT
Try it
CosyVoice3 Standard
Next-generation multilingual TTS with bi-streaming, emotion control, and zero-shot voice cloning.
Developer: Alibaba (FunAudioLLM) · License: Apache 2.0
Try it
NAMAA Saudi TTS Standard
First open Saudi-Arabic TTS. Native Saudi dialect with Chatterbox-quality voice cloning.
Developer: NAMAA Space · License: MIT
Try it
Darwin TTS Standard
Cross-modal Qwen3-TTS variant with FFN weights blended from the Qwen3-1.7B language model for sharper multilingual cloning.
Developer: FINAL-Bench · License: Apache 2.0
Try it
MOSS-TTSD Standard
Multi-speaker dialogue continuation model — generate podcast-style conversations with up to 5 speakers and 60 minutes of coherent audio.
Developer: OpenMOSS · License: Apache 2.0
Try it
CosyVoice 2
Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.
Languages: en, zh, ja, ko, fr, de, it, es
Clone Voice
IndexTTS-2
Zero-shot TTS with fine-grained emotion control and high expressiveness.
Languages: en, zh
Clone Voice
Spark TTS
Voice cloning TTS with controllable emotion and speaking style via prompts.
Languages: en, zh
Clone Voice
GPT-SoVITS
Few-shot voice cloning TTS that replicates any voice from just 5 seconds of audio.
Languages: en, zh, ja, ko
Clone Voice
Chatterbox
State-of-the-art zero-shot voice cloning with emotion control from Resemble AI.
Languages: en
Clone Voice
Tortoise TTS
Multi-voice text-to-speech focused on quality with autoregressive architecture.
Languages: en
Clone Voice
OpenVoice
Instant voice cloning with granular control over style, emotion, and accent.
Languages: en, zh, ja, ko, fr, es
Clone Voice
Qwen3 TTS
Alibaba's multilingual TTS with voice cloning, preset voices, and voice design from text.
Languages: en, zh, ja, ko, de, fr, ru, pt, es, it
Clone Voice
Chatterbox Turbo
Faster Chatterbox with sub-200ms latency and paralinguistic tags for laughs, coughs, and more.
Languages: en
Clone Voice
VoxCPM
Tokenizer-free TTS producing 44.1kHz audio with context-aware paragraph consistency.
Languages: en, zh
Clone Voice
OuteTTS
LLM-based TTS that runs on CPU, GPU, or browser via llama.cpp and Transformers.js.
Languages: en
Clone Voice
Pocket TTS
Lightweight 100M parameter model by Kyutai with voice cloning from a single sample.
Languages: en, fr
Clone Voice
CosyVoice3
Next-generation multilingual TTS with bi-streaming, emotion control, and zero-shot voice cloning.
Languages: en, zh, ja, ko, de, es, fr, it, ru
Clone Voice
NAMAA Saudi TTS
First open Saudi-Arabic TTS. Native Saudi dialect with Chatterbox-quality voice cloning.
Languages: ar
Clone Voice
Darwin TTS
Cross-modal Qwen3-TTS variant with FFN weights blended from the Qwen3-1.7B language model for sharper multilingual cloning.
Languages: en, ko, ja, zh
Clone Voice
MOSS-TTSD
Multi-speaker dialogue continuation model — generate podcast-style conversations with up to 5 speakers and 60 minutes of coherent audio.
Languages: en, zh
Clone Voice
Ming-Omni TTS
Compact 0.5B omni-modal speech model from inclusionAI with high-fidelity 44.1kHz output and zero-shot voice cloning.
Languages: en, zh
Clone Voice
MOSS-TTS Nano
Tiny 100M MOSS-TTS variant — same architecture, 80x smaller, free-tier latency.
Languages: en, zh, de, es, fr, ja, it, ko, ru, ar, pt
Clone VoiceDeveloper-First API
OpenAI-compatible REST API. One endpoint, 20+ models. Streaming support for real-time applications.
- OpenAI-compatible format
- Streaming TTS for real-time apps
- Batch processing for large jobs
- Webhook notifications
pip install ttsai
npm install @ttsainpm/ttsai
from tts_ai import TTSClient
client = TTSClient(api_key="sk-tts-xxx")
audio = client.generate(
text="Hello from TTS.ai!",
model="kokoro",
voice="af_bella",
)
client.save(audio, "output.mp3")
Simple, Transparent Pricing
Start free. Scale as you grow.
Free
15,000 characters + 5,000/day
- 7 free models including Kokoro
- 5,000 chars per generation
- API access included
Starter
500,000 characters/month
- All 20+ models
- 100,000 chars per generation
- Voice Cloning
Pro
2,000,000 characters/month
- Everything in Starter
- API access
- Priority processing
Frequently Asked Questions
What could we improve? Your feedback helps us fix issues.
Start Using AI Voice Today
Join creators, developers, and businesses using TTS.ai