Free AI Àkọlé sí Àkọ́kọ́
22+ open-source models, 100+ voices, 32+ Àwọn ìtàn. Kò ní kọ̀ǹpútà kan tí a fẹ́.
Gbogbo àwòrán tí o fẹ̀ fún àwòrán AI
26 àwọn ìrísí-lẹ́tà tí 24+ àwọn ìṣàmúlò-ètò AI tí a ṣí
Àwọn ìṣàmúlò-ètò àwọn àwòrán
Àwọn ìṣàmúlò-ètò TTS ìṣàmúlò-ètò mìíràn nínú pánẹ́ẹ̀lì kan
Kokoro Free
Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.
Tí o darà fún: High-quality TTS with minimal latency, streaming applications
WòyePiper Free
Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.
Tí o darà fún: Quick previews, accessibility, and embedded applications
WòyeVITS Free
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.
Tí o darà fún: General-purpose text-to-speech with natural prosody
WòyeMeloTTS Free
MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.
Tí o darà fún: Àwọn ìṣàmúlò-ètò ìṣàmúlò-ètò tí fẹ́ ìṣàmúlò-ètò TTS àìpẹ̀, àwọn ìṣàmúlò-ètò mìíràn
WòyeBark Standard
Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.
Alábòójútó: Suno · Àwọn Àmì-ìwé: MIT
WòyéBark Small Standard
Lighter version of Bark with faster inference and lower memory usage.
Alábòójútó: Suno · Àwọn Àmì-ìwé: MIT
WòyéCosyVoice 2 Standard
Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.
Alábòójútó: Alibaba (Tongyi Lab) · Àwọn Àmì-ìwé: Apache 2.0
WòyéDia TTS Standard
Módélù ìṣàfilọ́lẹ̀ àgbèkalẹ̀ àwọn ìṣàfilọ́lẹ̀ mìíràn tí n ṣẹ̀dá àgbèkalẹ̀ ìṣàfilọ́lẹ̀ inú àwọn ìṣàfilọ́lẹ̀.
Alábòójútó: Nari Labs · Àwọn Àmì-ìwé: Apache 2.0
WòyéParler TTS Standard
Describe the voice you want in natural language and Parler generates matching speech.
Alábòójútó: Hugging Face · Àwọn Àmì-ìwé: Apache 2.0
WòyéIndexTTS-2 Standard
Zero-shot TTS with fine-grained emotion control and high expressiveness.
Alábòójútó: Index Team · Àwọn Àmì-ìwé: Apache 2.0
WòyéSpark TTS Standard
Voice cloning TTS with controllable emotion and speaking style via prompts.
Alábòójútó: SparkAudio · Àwọn Àmì-ìwé: Apache 2.0
WòyéGPT-SoVITS Standard
Few-shot voice cloning TTS that replicates any voice from just 5 seconds of audio.
Alábòójútó: RVC-Boss · Àwọn Àmì-ìwé: MIT
WòyéOrpheus Standard
Human-level emotional TTS model trained on 100K hours of speech data.
Alábòójútó: Canopy Labs · Àwọn Àmì-ìwé: Llama 3.2 Community
WòyéQwen3 TTS Standard
Alibaba's multilingual TTS with voice cloning, preset voices, and voice design from text.
Alábòójútó: Alibaba (Qwen) · Àwọn Àmì-ìwé: Apache 2.0
WòyéCosyVoice 2
Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.
Àwọn èdè: en, zh, ja, ko, fr, de, it, es
Àwọn Àmì-ìwéIndexTTS-2
Zero-shot TTS with fine-grained emotion control and high expressiveness.
Àwọn èdè: en, zh
Àwọn Àmì-ìwéSpark TTS
Voice cloning TTS with controllable emotion and speaking style via prompts.
Àwọn èdè: en, zh
Àwọn Àmì-ìwéGPT-SoVITS
Few-shot voice cloning TTS that replicates any voice from just 5 seconds of audio.
Àwọn èdè: en, zh, ja, ko
Àwọn Àmì-ìwéChatterbox
Iṣàfilọ́lẹ̀ àwòrán tí kò ní ìṣàfilọ́lẹ̀ pẹ̀lú ìṣàfilọ́lẹ̀ ìrànwọ́ láti inú Resemble AI.
Àwọn èdè: en
Àwọn Àmì-ìwéTortoise TTS
Àkọlé àwòrán-si-ìgbàkalẹ̀-ìgbàkalẹ̀-ìgbàkalẹ̀-ìgbàkalẹ̀-ìgbàkalẹ̀-ìgbàkalẹ̀-ìgbàkalẹ̀-ìgbàkalẹ̀
Àwọn èdè: en
Àwọn Àmì-ìwéOpenVoice
Ìṣàmúlò-ètò àwọn àwòrán láàyè-ètò ní pàtó àwọn ìṣàmúlò-ètò, àwọn ìrànwọ́, àti àwọn ìṣàfihàn.
Àwọn èdè: en, zh, ja, ko, fr, de, es, it
Àwọn Àmì-ìwéQwen3 TTS
Alibaba's multilingual TTS with voice cloning, preset voices, and voice design from text.
Àwọn èdè: en, zh, ja, ko, de, fr, ru, pt, es, it
Àwọn Àmì-ìwéÀwọn Ìṣàmúlò-ètò
API REST OpenAI-ọ̀pọ̀. Ààyè kan, 22+ àwọn ìṣàmúlò-ètò. Ààyè fún àwọn ìṣàmúlò-ètò ìgbáyàn.
- Ìgúnrégé tí a bá fẹ́
- Ìṣàfilọ́lẹ̀ TTS fún àwọn ìṣàmúlò-ètò ìtàn
- Ìṣàmúlò-ètò àwọn iṣẹ́ nlà
- Àwọn ìsàlẹ̀-ilà Webhook
import requests
response = requests.post(
"https://api.tts.ai/v1/tts/",
headers={"Authorization": "Bearer sk-tts-xxx"},
json={
"model": "kokoro",
"text": "Hello from TTS.ai!",
"voice": "af_bella",
}
)
with open("output.mp3", "wb") as f:
f.write(response.content)
Simple, Transparent Pricing
Bẹ́ẹ̀nì. Ṣẹ̀dà bí o tí wúlò.
Àìfihàn
Àwọn ẹ̀yàn 50
- Kokoro, Piper, VITS, MeloTTS
- Àwọn àmì-ìwé àwọn àmì-ìwé
- 3 ọjọ/aago (kò ní kọ̀ǹpútà)
Ìṣàmúlò-ètò
500 credits/month
- Gbogbo àwọn àwòrán 22+
- Àwọn àmì-ìwé àwọn àmì-ìwé
- Àwọn Àmì-ìwé
Àwọn Ìṣàmúlò-ètò
2,000 credits/month
- Gbogbo àwòrán nínú Aṣàfilọ́lẹ̀
- Ààyè-iṣẹ́ API
- Àwọn Ìṣàmúlò-ètò
Àwọn Ìṣàfilọ́lẹ̀
10,000 credits/month
- Ohun gbogbo nínú Pro
- Aṣàfilọ́lẹ̀ API
- Àwọn ìṣàmúlò-ètò
Àwọn Àtòjọ-ẹ̀yàn
Ṣàfihàn àwòrán AI
Ṣàfikún àwọn ìṣàfilọ́lẹ̀, àwọn ìṣàfilọ́lẹ̀, àti àwọn ile-iṣẹ́ láti lo TTS.ai