I-Free AI Umbhalo usuka kumazwi

22+ open-source models, 100+ voices, 32+ Izilimi. Akukho akhawunti edingekayo.

0/500 amaphawu Ikhululekile
Akunakhadi le-credit 50 credits free 32+ Izilimi Ukusetshenziswa kwebhizinisi OK
0:00 / 0:00
Download Audio Isixhumanisi siphele ngehora le-24
Uthanda i-TTS.ai? Xhumana nabangane bakho!

Yonke into oyidingayo ngezwi AI

Amathuluzi angama-26 asebenza nge-24+ open-source AI models

Amamodeli omsindo we-AI angama-22+

Iqoqo elibanzi kakhulu le-open-source TTS models kwi-platform eyodwa

KokoroKokoro Free

Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.

Engcono kakhulu: High-quality TTS with minimal latency, streaming applications

Zama mahhala

PiperPiper Free

Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.

Engcono kakhulu: Quick previews, accessibility, and embedded applications

Zama mahhala

VITSVITS Free

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.

Engcono kakhulu: General-purpose text-to-speech with natural prosody

Zama mahhala

MeloTTSMeloTTS Free

MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.

Engcono kakhulu: Izisebenziso zokukhiqiza ezidinga i-TTS esheshayo, enezilimi eziningi

Zama mahhala

BarkBark Standard

Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.

Umthuthukisi: Suno · Ilayisense: MIT

Zama

Bark SmallBark Small Standard

Lighter version of Bark with faster inference and lower memory usage.

Umthuthukisi: Suno · Ilayisense: MIT

Zama

CosyVoice 2CosyVoice 2 Standard

Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.

Umthuthukisi: Alibaba (Tongyi Lab) · Ilayisense: Apache 2.0

Zama

Dia TTSDia TTS Standard

Multi-speaker dialog generation model that creates natural conversations between speakers.

Umthuthukisi: Nari Labs · Ilayisense: Apache 2.0

Zama

Parler TTSParler TTS Standard

Describe the voice you want in natural language and Parler generates matching speech.

Umthuthukisi: Hugging Face · Ilayisense: Apache 2.0

Zama

IndexTTS-2IndexTTS-2 Standard

Zero-shot TTS with fine-grained emotion control and high expressiveness.

Umthuthukisi: Index Team · Ilayisense: Apache 2.0

Zama

Spark TTSSpark TTS Standard

Voice cloning TTS with controllable emotion and speaking style via prompts.

Umthuthukisi: SparkAudio · Ilayisense: Apache 2.0

Zama

GPT-SoVITSGPT-SoVITS Standard

Few-shot voice cloning TTS that replicates any voice from just 5 seconds of audio.

Umthuthukisi: RVC-Boss · Ilayisense: MIT

Zama

OrpheusOrpheus Standard

Human-level emotional TTS model trained on 100K hours of speech data.

Umthuthukisi: Canopy Labs · Ilayisense: Llama 3.2 Community

Zama

Qwen3 TTSQwen3 TTS Standard

Alibaba's multilingual TTS with voice cloning, preset voices, and voice design from text.

Umthuthukisi: Alibaba (Qwen) · Ilayisense: Apache 2.0

Zama

ChatterboxChatterbox Premium

I-state-of-the-art zero-shot voice cloning nge-emotion control kusuka ku-Resemble AI.

Ubunjani:

Zama

Tortoise TTSTortoise TTS Premium

Umbhalo-ku-ukukhuluma-ngezwi-eliningi elibhekisela kukhwalithi ngesakhiwo se-autoregressive.

Ubunjani:

Zama

StyleTTS 2StyleTTS 2 Premium

Human-level text-to-speech through style diffusion and adversarial training.

Ubunjani:

Zama

OpenVoiceOpenVoice Premium

Instant voice cloning with granular control over style, emotion, and accent.

Ubunjani:

Zama

CosyVoice 2CosyVoice 2

Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.

Izilimi: en, zh, ja, ko, fr, de, it, es

Umsindo we Clone

IndexTTS-2IndexTTS-2

Zero-shot TTS with fine-grained emotion control and high expressiveness.

Izilimi: en, zh

Umsindo we Clone

Spark TTSSpark TTS

Voice cloning TTS with controllable emotion and speaking style via prompts.

Izilimi: en, zh

Umsindo we Clone

GPT-SoVITSGPT-SoVITS

Few-shot voice cloning TTS that replicates any voice from just 5 seconds of audio.

Izilimi: en, zh, ja, ko

Umsindo we Clone

ChatterboxChatterbox

I-state-of-the-art zero-shot voice cloning nge-emotion control kusuka ku-Resemble AI.

Izilimi: en

Umsindo we Clone

Tortoise TTSTortoise TTS

Umbhalo-ku-ukukhuluma-ngezwi-eliningi elibhekisela kukhwalithi ngesakhiwo se-autoregressive.

Izilimi: en

Umsindo we Clone

OpenVoiceOpenVoice

Instant voice cloning with granular control over style, emotion, and accent.

Izilimi: en, zh, ja, ko, fr, de, es, it

Umsindo we Clone

Qwen3 TTSQwen3 TTS

Alibaba's multilingual TTS with voice cloning, preset voices, and voice design from text.

Izilimi: en, zh, ja, ko, de, fr, ru, pt, es, it

Umsindo we Clone

Umthuthukisi-kuqala API

I-OpenAI-compatible REST API. Umkhawulo owodwa, 22+ amamodeli. Ukusakaza insizakalo yezicelo zesikhathi sangempela.

  • Ifomethi ehambisana ne-OpenAI
  • Ukusakazwa kwe-TTS kwezinhlelo zokusebenza zesikhathi sangempela
  • Uhlelo lwe-batch lwemisebenzi enkulu
  • Ulwaziso lwe-Webhook
Bona amadokhumende we-API
Python
import requests

response = requests.post(
    "https://api.tts.ai/v1/tts/",
    headers={"Authorization": "Bearer sk-tts-xxx"},
    json={
        "model": "kokoro",
        "text": "Hello from TTS.ai!",
        "voice": "af_bella",
    }
)

with open("output.mp3", "wb") as f:
    f.write(response.content)

Intengo elula, ecacile

Qalisa ngokukhululekileyo. Ukukala njengoba ukhula.

Ikhululekile

$0

Ama-credits angama-50

  • Kokoro, Piper, VITS, MeloTTS
  • Umkhawulo wamaphawu angama-500
  • 3 gen/ihora (akukho akhawunti)
Ubhalise

Isiqalisi

$9/ihora

500 credits/month

  • Onke amamodeli angama-22+
  • 5,000 umkhawulo wamaphawu
  • Ukulungiswa kwezwi
Qala
Okuthandwa kakhulu

i-Pro

$29/ihora

2,000 credits/month

  • Konke ku-Starter
  • Ukungena kwe-API
  • Ukulungiswa kokuqala
Fumana i-Pro

Ibhizinisi

$99/ihora

10,000 credits/month

  • Konke ku-Pro
  • I-bulk API
  • Ifolokhwe yesinqumo
Xhumana nomthengisi

View all plans including credit packs →

Imibuzo ebuzwa kaningi

I-TTS.ai iyipulatifomu yezwi le-AI ebanzi kakhulu, enikeza amamodeli we-text-to-speech angama-22 +, ukucloning kwezwi, ukuxoxa kwe-text-to-text, namathuluzi e-audio. Onke amamodeli avulekile ngaphandle kwe-vendor lock-in.

Yebo! TTS.ai inikeza i-text-to-speech yamahhala ne-Kokoro, i-Piper, i-VITS, ne-MeloTTS models. Akukho akhawunti edingekayo. Bhala ukuze uthole ama-credits angama-50 amahhala futhi ufinyelele kuzo zonke imodeli. Ama-plans akhokhelwayo aqala ku- $ 9 / ngenyanga.

Ngezinga, sebenzisa iKokoro noma iPiper. Ngekhwalithi, sebenzisa iCosyVoice 2 noma iStyleTTS 2. Ngezwi lokuklona, sebenzisa iChatterbox noma iGPT-SoVITS. Ngezingxoxo, sebenzisa iDia TTS. Zama amamodeli amaningi kumbhalo owodwa ukuze ulinganise.

Yebo. I-OpenAI-ehambisana ne-REST API ye-TTS, i-STT, ukucloning kwezwi, namathuluzi e-audio. Itholakala ku-Pro ($ 29 / mo) ne-Enterprise ($ 99 / mo) amaphrojekthi. Bona idatabase ku-tts.ai / api /.

Ikhwalithi yomsindo ihluka ngokwemodeli. Amamodeli aphezulu njenge-CosyVoice 2, StyleTTS 2, ne-Chatterbox akhiqiza umsindo osezingeni elifanayo nomuntu nge-intonation ne-emotions ezijwayelekile. Amamodeli amahhala njenge-Kokoro anikeza ikhwalithi engcono kakhulu yezimo eziningi zokusetshenziswa.

I-TTS.ai ixhasa izilimi ezingaphezu kuka-30 ngaphesheya kwelayibrari yayo yemodeli. IsiNgisi sinesixhaso semodeli esibanzi kakhulu, kodwa amamodeli afana ne-CosyVoice 2 afaka ama-Chinese, ama-Japanese, nama-Korean; i-GPT-SoVITS iphatha ama-Chinese, ama-Japanese, ama-Korean, nama-English; futhi i-MeloTTS ixhasa ama-English, ama-Spanish, ama-French, ama-Chinese, ama-Japanese, nama-Korean.

Yebo. Zonke izinqubo zenzeka kumaseva ethu e-GPU. Asigcinanga umbhalo wakho ongeniswe noma umsindo okhiqizwe ngemuva kokuthunyelwa. Amasampula omsindo alayishwe phezulu okuklonyelwe asetshenziswa kuphela kwisiqephu samanje futhi agcinwanga. Asikho isikhathi lapho sibelana khona idatha yakho namanye amaqembu noma sisebenzisa khona ukuqeqesha amamodeli.

Yes. All audio generated on TTS.ai is yours to use commercially, including for YouTube videos, podcasts, audiobooks, apps, advertisements, and products. Our models are open source under permissive licenses (MIT, Apache 2.0). No royalties or attribution required.

I-TTS.ai ikhiqiza umsindo ngefomethi ye-WAV ngokuzenzakalela ngekhwalithi ephezulu. Ungaguqula ube yi-MP3, FLAC, OGG, noma i-M4A usebenzisa ithuluzi lethu elimahhala le-Audio Converter. I-API ixhasa ukucacisa ifomethi yakho yokukhishwa okuthandekayo ngqo kwisicelo.

Upload a short audio sample (as little as 5 seconds) of the voice you want to clone, then type any text to generate speech in that voice. Models like Chatterbox, GPT-SoVITS, and CosyVoice 2 support voice cloning. The cloned voice captures tone, accent, and speaking style.

Amamodeli amahhala (i-Kokoro, i-Piper, i-VITS, i-MeloTTS) adinga i-akhawunti futhi abiza ama-credits angu-zero. Amamodeli ajwayelekile (ama-credits angu-2/ama-characters angu-1K) afaka i-Bark, i-CosyVoice 2, i-F5-TTS, ne-Dia. Amamodeli aphezulu (ama-credits angu-4/ama-characters angu-1K) afaka i-OpenVoice, i-Chatterbox, i-StyleTTS 2, ne-Tortoise. Amamodeli akhokhelwayo ngokuvamile anikezela ngekhwalithi ephezulu, amagama amaningi, kanye nezici ezingeziwe ezifana nokuhlanganiswa kwezwi.

Yebo. I-API ixhasa ukucutshungulwa kwe-batch ukuguqula amavolumu amakhulu we-text ku-speech. Sebenzisa izicelo eziningi bese uthola izimpendulo nge-asynchronously usebenzisa i-job UUIDs. Izinhlelo ze-Enterprise ($99/mo) zifaka phakathi ukufinyelela kwe-queue yokuqala ukucubungula okusheshayo kwe-batch. Ilungele ukukhiqizwa kwe-audiobook, okuqukethwe kwe-course, namaphrojekthi amakhulu we-voiceover.
5.0/5 (1)

Qala ukusebenzisa umsindo we-AI namhlanje

Xhumana nabakhiqizi, abathuthukisi, namabhizinisi usebenzisa i-TTS.ai