QShortcut

Mai juya rubutu zuwa magana mai sauti na dabi'a tare da kayan kwalliyar AI mai sauki. Kyauta don amfani, babu bukatun asusun.

Ba mu da sauti na TTS a cikin harshenka har yanzu. Ka taimake mu da ƙara naka! QShortcut
0/500 @ action
QSql @ action

@ action

<speak><prosody rate="slow">Slow speech</prosody></speak>

Ƙara maɓallan jigogi don tasiri bayarwa (goyon baya na maɓalli ya bambanta):

@ action

-12 +12
0.5x 2.0x
Kyauta tare da Piper, VITS, MeloTTS
Za'a nuna sauti da ka samar a nan. Zaɓi wani nau'i, shigar da rubutu, sa'an nan ka danna "Yanar da".
An ƙãga halittar sauti da nasara
0:00 0:00
QFileDialog Link expires in 24h
Like TTS.ai? Ka gaya wa abokanka!

@ action

Kitten TTS

Kitten TTS

Free

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

Mawallafi: KittenML
Lasisi: Apache 2.0
QSoftKeyManager Fast
QPrintPreviewDialog
@ item Spelling dictionary 1 language
VRAM 0GB
QShortcut QFileDialog
QDialogButtonBox:
CPU-only inference Under 80MB model size 8 built-in voices Speed control ONNX-based 24kHz output
Mafi kyawun ga:: Fast lightweight TTS, edge deployment, low-latency applications

QShortcut

  • KCharselect unicode block name
  • KCharselect unicode block name
  • @ action
  • Yi amfani da alamomin (...) don tsawo da tsawo
  • Yi kokarin Kokoro ko CosyVoice 2 domin samun mafi kyawun sakamako
  • Yi amfani da Dia don zauren ganawa da abun cikin podcast na masu magana da yawa

KCharselect unicode block name

Dakata @ action
QDialogButtonBox 1:1 (farashi)
@ action @ action
PremiumLanguage @ item Spelling dictionary

Yadda AI Text to Speech ke aiki

Create professional-quality voiceovers in three simple steps. No technical knowledge required.

QPrintPreviewDialog

@ action

@ action

Mataki na 2

@ action

Zaɓi daga 20+ AI siffofin a kan uku mataki. Zaɓi wani sauti da ya dace da abun ciki, zaɓi harshenku na manufa, daidaita gudun wasa daga 0.5x zuwa 2.0x, da kuma zaɓi format fitarwa da kuke so (MP3, WAV, OGG, ko FLAC).

QDialogButtonBox

QFileDialog

Ka danna Fara kuma sautirka za ta yi aiki cikin daƙiƙoƙi. Ka yi nazari tare da mai kunnawa na ciki, ka sauke a cikin sifar da ka zaɓa, ko ka kwafe wani maɓalli mai rabawa. Ka yi amfani da API don aiwatar da ƙungiya da haɗawa cikin tafiyar aikinka.

@ item: inlistbox

AI-powered text-to-speech yana canza yadda mutane ke ƙirƙira, amfani da, da kuma yin hulɗa da abun ciki na sauti a cikin masana'antu da yawa.

KCharselect unicode block name

Bayanan bayani game da kowane nau'in AI da ake samu a kan TTS.ai. Ka kwatanta inganci, gudu, goyon bayan harshe, da ayyuka don gano nau'in da ya dace da shirinka.

KokoroKokoro

Free

Kokoro wani nau'in rubutu zuwa magana mai paramita miliyan 82 ne wanda ke dauke da nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nauyin nau

Mawallafi::
Hexgrad
Lasisi::
Apache 2.0
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary:
en, ja, zh, ko, fr, de, it, pt, es, hi, ru
VRAM:
1.5GB
QShortcut:
QFileDialog
@ action:
QDialogButtonBox
KCharselect unicode block name QPrintPreviewDialog KCharselect unicode block name @ item Spelling dictionary QDialogButtonBox
Mafi kyawun ga:: High quality TTS with minimum latency, streaming applications

PiperPiper

Free

Piper wani mai sarrafa rubutu zuwa magana ne mai sauƙi wanda Rhasspy ya kirkiro wanda ke amfani da VITS da larynx architectures. Yana tafiyar da shi gaba ɗaya akan CPU, yana sanya shi mafi kyau ga na'urorin gefe, aikace-aikacen gida, da kuma aikace-aikacen da ke buƙatar TTS na waje. Tare da fiye da 100 na sauti a cikin harsuna 30 +, Piper yana bayar da magana mai sauti na halitta a cikin saurin lokaci na gaskiya har ma a kan Raspberry Pi 4.

Mawallafi::
Rhasspy
Lasisi::
MIT
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary:
en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
VRAM:
0 (CPU only)
QShortcut:
QFileDialog
@ action:
QDialogButtonBox
CPU-friendly QDialogButtonBox KCharselect unicode block name Harsuna QShortcut
Mafi kyawun ga:: Previews quick, accessibility, and embedded applications

VITSVITS

Free

@ info: shell

Mawallafi::
Jaehyeon Kim et al.
Lasisi::
MIT
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko
VRAM:
1GB
QShortcut:
QFileDialog
@ action:
QDialogButtonBox
KCharselect unicode block name KCharselect unicode block name QPrintPreviewDialog KCharselect unicode block name
Mafi kyawun ga:: KCharselect unicode block name

MeloTTSMeloTTS

Free

MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.

Mawallafi::
MyShell.ai
Lasisi::
MIT
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary:
en, es, fr, zh, ja, ko
VRAM:
0.5GB (GPU optional)
QShortcut:
QFileDialog
@ action:
QDialogButtonBox
QSoftKeyManager @ item Spelling dictionary KCharselect unicode block name QPrintPreviewDialog KCharselect unicode block name
Mafi kyawun ga:: Shiryoyin ayuka na samarwa suna buƙatar TTS mai sauri, da ya ƙunshi yarukan da dama

BarkBark

Standard

Bark by Suno shi ne wani nau'in rubutu zuwa sauti mai dogara da mai sauya-waƙa wanda zai iya samar da magana mai kyau, da yawa da kuma wasu sauti kamar kiɗa, da zafi na baya, da kuma sakamako na sauti. Zai iya samar da sadarwar da ba ta magana ba kamar kallo, da kuka, da kuma kuka. Bark yana goyon bayan fiye da 100 na masu magana da 13+ yarukan.

Mawallafi::
Suno
Lasisi::
MIT
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
5GB
QShortcut:
QFileDialog
@ action:
2x
Jigogi na Sauti Mãsu dãriya ne, mãsu kũka ne. Yiwa kiɗa halitta @ item Spelling dictionary @ item Spelling dictionary
Mafi kyawun ga:: Tsarin sauti mai zane, littattafai na sauti tare da jin dadi, sakamako na sauti

Bark SmallBark Small

Standard

Bark Small wani nau'i ne na siffar Bark wanda ke sayar da wasu siffofin sauti don sauri mai sauri da buƙatun ƙwaƙwalwar ajiya. Yana riƙe da ƙarfin Bark na ƙirƙirar magana tare da ra'ayoyi, murmushi, da harsuna da yawa.

Mawallafi::
Suno
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
2GB
QShortcut:
QFileDialog
@ action:
2x
QPrintPreviewDialog QDialogButtonBox QFontDatabase @ item Spelling dictionary
Mafi kyawun ga:: Sauti mai sauri mai ƙira idan barken da ke ciki ya yi sauri sosai

CosyVoice 2CosyVoice 2

Standard

@ item: inlistbox

Mawallafi::
Alibaba (Tongyi Lab)
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko, fr, de, it, es
VRAM:
4GB
QShortcut:
QDialogButtonBox
@ action:
2x
QDialogButtonBox QPrintPreviewDialog QFontDatabase QFontDatabase KCharselect unicode block name
Mafi kyawun ga:: Shiryoyin Ayuka na Lokaci-Na-Aiki, TTS mai gudana, masu taimakon magana

Dia TTSDia TTS

Standard

Dia daga Nari Labs wani 1.6B paramita rubutu-zuwa- magana siffar da aka tsara musamman ga samar da multi- mai magana da magana. Yana iya samar da dabi'a-sounding hira tsakanin masu magana biyu da dace turn-taking, prosody, da ji-na-ji. Dia ne m ga halitta podcast-style abun ciki, audiobook hira, da mai magana da AI.

Mawallafi::
Nari Labs
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
4GB
QShortcut:
QFileDialog
@ action:
2x
KCharselect unicode block name @ action KCharselect unicode block name KCharselect unicode block name Parameters
Mafi kyawun ga:: Podcasts, audiobook dialogues, conversational content

Parler TTSParler TTS

Standard

Parler TTS wani nau'in rubutu zuwa magana ne wanda ke amfani da bayanin maganar harshe na halitta domin kula da maganar da aka samar. A maimakon ka zaɓa daga cikin sauti da aka ƙayyade a gaba, ka bayyana maganar da kake so (misal, "wani sauti na mace mai zafi da wani ɗan Ingilishi mai ƙarancin magana, yana magana da sauri da bayyane") kuma Parler zai samar da maganar da ke daidai da wannan bayanin. Wannan yana sa shi ya zama mai sauki ga shiri-girma na ƙira.

Mawallafi::
Hugging Face
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
4GB
QShortcut:
QFileDialog
@ action:
2x
QShortcut QFontDatabase @ item: inlistbox QDialogButtonBox
Mafi kyawun ga:: Shiryoyin Ayuka na Cikakken Ciki inda kuke buƙata da halaye na sauti na ɗabi'a

GLM-TTSGLM-TTS

Standard

GLM-TTS by Zhipu AI is a text-to-speech system built on the Llama architecture with flow matching. It achieves the lowest character error rate among open-source TTS models, meaning it produces the most accurate pronunciation. GLM-TTS supports English and Chinese with voice cloning from 3-10 second audio samples.

Mawallafi::
Zhipu AI
Lasisi::
GLM-4 License
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh
VRAM:
4GB
QShortcut:
QDialogButtonBox
@ action:
2x
QFileDialog QShortcut QDialogButtonBox KCharselect unicode block name
Mafi kyawun ga:: @ action

IndexTTS-2IndexTTS-2

Standard

IndexTTS-2 shi ne tsarin rubutu-zuwa- magana mai zurfi wanda yake da kyau a cikin ƙirƙirar sauti mai zafi tare da kula da jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin jin dadin

Mawallafi::
Index Team
Lasisi::
Bilibili Model License
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh
VRAM:
4GB
QShortcut:
QDialogButtonBox
@ action:
2x
KCharselect unicode block name QPrintPreviewDialog QFontDatabase KCharselect unicode block name QPrintPreviewDialog
Mafi kyawun ga:: Bayanan da ke nuna jin dadi, littattafai na sauti, masu taimakawa na zahiri

Spark TTSSpark TTS

Standard

Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.

Mawallafi::
SparkAudio
Lasisi::
CC BY-NC-SA 4.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh
VRAM:
4GB
QShortcut:
QDialogButtonBox
@ action:
2x
@ action QFontDatabase QPrintPreviewDialog QDialogButtonBox 5-second clone
Mafi kyawun ga:: Create content with cloned voices and emotional control

GPT-SoVITSGPT-SoVITS

Standard

GPT-SoVITS na haɗa GPT-style language modeling da SoVITS (Singing Voice Inference via Translation and Synthesis) domin ƙwarewa mai ƙarfi na ƙãga halittar sauti. Da kaɗan kamar sakan 5 na alaƙa da sauti, zai iya ƙirƙirar ƙãga halittar sauti da kuma samar da magana mai kyau yayin da yake kiyaye halayen mai maganar. Yana da kyau a cikin ƙirƙirar magana da ƙãga halittar sauti.

Mawallafi::
RVC-Boss
Lasisi::
MIT
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko
VRAM:
6GB
QShortcut:
QDialogButtonBox
@ action:
2x
5-second clone QShortcut KCharselect unicode block name KCharselect unicode block name KCharselect unicode block name
Mafi kyawun ga:: @ info: status

OrpheusOrpheus

Standard

Orpheus wani nau'i ne na rubutu zuwa magana mai girma wanda yake iya bayyana ra'ayi na mutum. An horar da shi a kan sa'o'i fiye da 100,000 na bayanai na magana daban-daban, yana da kyau wajen samar da magana tare da ra'ayoyi na halitta, haske, da nau'ikan magana. Orpheus na iya samar da magana da ba za a iya bambanta ta da na mutum ba.

Mawallafi::
Canopy Labs
Lasisi::
Llama 3.2 Community
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
4GB
QShortcut:
QFileDialog
@ action:
2x
KCharselect unicode block name QShortcut QFontDatabase KCharselect unicode block name
Mafi kyawun ga:: High quality emotional speech, audiobooks, voice acting

ChatterboxChatterbox

Premium

Chatterbox by Resemble AI is a cutting-edge zero-shot voice cloning model. It can replicate any voice from a single audio sample with remarkable accuracy, capturing not only the timbre but also the speaking style and emotional nuances. Chatterbox also features fine-grained emotion control, allowing you to adjust the emotional tone of the generated speech independently from the voice identity.

Mawallafi::
Resemble AI
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
4GB
QShortcut:
QDialogButtonBox
@ action:
4x
QPrintPreviewDialog KCharselect unicode block name QFontDatabase QPrintPreviewDialog @ action
Mafi kyawun ga:: K_ariyar magana da kulawa da jin dadi, ƙirƙirar abun ciki

Tortoise TTSTortoise TTS

Premium

Tortoise TTS shi ne tsarin rubutu zuwa magana na sauti mai yawa wanda ke ba da fifiko ga ingancin sauti fiye da sauri. Yana amfani da DALL-E-inspired architecture don samar da magana mai kyau tare da prosody mai kyau da bambancin mai magana. Lokacin da yake da sauri fiye da wasu zaɓuɓɓuka da yawa, Tortoise yana samar da wasu daga cikin mafi gaskiyar maganar synthesized da ke samuwa a cikin open-source ecosystem.

Mawallafi::
James Betker
Lasisi::
Apache 2.0
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
8GB
QShortcut:
QDialogButtonBox
@ action:
4x
QPrintPreviewDialog KCharselect unicode block name KCharselect unicode block name @ action QDialogButtonBox
Mafi kyawun ga:: Litattafai na sauti, abun ciki na musamman, shiri-na- farko na inganci

StyleTTS 2StyleTTS 2

Premium

@ item: inlistbox

Mawallafi::
Columbia University
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
4GB
QShortcut:
QFileDialog
@ action:
4x
KCharselect unicode block name @ action QFontDatabase @ action KCharselect unicode block name
Mafi kyawun ga:: Studio-quality single-speaker synthesis, professional narration

OpenVoiceOpenVoice

Premium

OpenVoice daga MyShell.ai yana ba da damar kwaikwayon magana da sauri tare da kulawa mai zurfi akan salo na magana, jin dadi, haske, ritshi, tsawo, da kuma intonation. Yana iya kwaikwayon magana daga wani bidiyo mai gajeren lokaci kuma ya samar da magana cikin yarukan da yawa yayin da yake kiyaye asalin mai magana. OpenVoice kuma yana aiki kamar mai canja magana, yana ba da damar canja magana cikin lokaci.

Mawallafi::
MyShell.ai / MIT
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko, fr, de, es, it
VRAM:
4GB
QShortcut:
QDialogButtonBox
@ action:
4x
@ action QShortcut KCharselect unicode block name KCharselect unicode block name @ item Spelling dictionary
Mafi kyawun ga:: @ info: status

Qwen3 TTSQwen3 TTS

Standard

Qwen3-TTS wani nau'in rubutu zuwa magana mai paramita biliyan 1.7 ne daga ƙungiyar Qwen ta Alibaba. Yana goyon bayan hanyoyi uku: zažužžukan sauti da aka ƙayyade da kula da jin dadi (9 masu magana), ƙirƙirar sauti daga sakan 3 kawai na sauti, da kuma hanyoyi na musamman na ƙirar sauti inda kake bayyana sauti da kake so cikin harshe na halitta. Yana rufe harsuna 10 da girman bayyanawa da kuma prosody na halitta.

Mawallafi::
Alibaba (Qwen)
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko, de, fr, ru, pt, es, it
VRAM:
7GB
QShortcut:
QDialogButtonBox
@ action:
2x
@ action KCharselect unicode block name KCharselect unicode block name QFontDatabase Harsuna
Mafi kyawun ga:: @ item Spelling dictionary

Sesame CSMSesame CSM

Premium

@ item: inmenu

Mawallafi::
Sesame
Lasisi::
Apache 2.0
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
8GB
QShortcut:
QFileDialog
@ action:
4x
KCharselect unicode block name QDialogButtonBox QShortcut QShortcut Parameters
Mafi kyawun ga:: AI masu taimakawa, chatbots, AI masu magana da shiri-shiri

Kitten TTSKitten TTS

Free

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

Mawallafi::
KittenML
Lasisi::
Apache 2.0
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary:
en
VRAM:
0GB
QShortcut:
QFileDialog
@ action:
QDialogButtonBox
CPU-only inference Under 80MB model size 8 built-in voices Speed control ONNX-based 24kHz output
Mafi kyawun ga:: Fast lightweight TTS, edge deployment, low-latency applications

KokoroKokoro

QDialogButtonBox

Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.

Mawallafi::
Hexgrad
Lasisi::
Apache 2.0
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary: en, ja, zh, ko, fr, de, it, pt, es, hi, ru
Mafi kyawun ga:: High-quality TTS with minimal latency, streaming applications

PiperPiper

QDialogButtonBox

Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.

Mawallafi::
Rhasspy
Lasisi::
MIT
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary: en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi
Mafi kyawun ga:: Quick previews, accessibility, and embedded applications

VITSVITS

QDialogButtonBox

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.

Mawallafi::
Jaehyeon Kim et al.
Lasisi::
MIT
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary: en, zh, ja, ko
Mafi kyawun ga:: General-purpose text-to-speech with natural prosody

MeloTTSMeloTTS

QDialogButtonBox

MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.

Mawallafi::
MyShell.ai
Lasisi::
MIT
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary: en, es, fr, zh, ja, ko
Mafi kyawun ga:: Production applications needing fast, multilingual TTS

Kitten TTSKitten TTS

QDialogButtonBox

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

Mawallafi::
KittenML
Lasisi::
Apache 2.0
QSoftKeyManager:
Fast
QPrintPreviewDialog:
@ item Spelling dictionary: en
Mafi kyawun ga:: Fast lightweight TTS, edge deployment, low-latency applications

BarkBark

@ action

Bark by Suno is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio like music, background noise, and sound effects. It can produce nonverbal communications like laughing, sighing, and crying. Bark supports over 100 speaker presets and 13+ languages.

Mawallafi::
Suno
Lasisi::
MIT
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
QShortcut:
QFileDialog
Sound effectsLaughing/sighingMusic generation100+ speakersMultilingual
Mafi kyawun ga:: Creative audio content, audiobooks with emotion, sound effects

Bark SmallBark Small

@ action

Bark Small is a distilled version of the Bark model that trades some audio quality for significantly faster inference speeds and lower memory requirements. It retains Bark's ability to generate speech with emotions, laughter, and multiple languages.

Mawallafi::
Suno
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
QShortcut:
QFileDialog
LightweightFaster than full BarkEmotional speechMultilingual
Mafi kyawun ga:: Quick creative audio when full Bark is too slow

CosyVoice 2CosyVoice 2

@ action

CosyVoice 2 by Alibaba's Tongyi Lab achieves human-comparable speech quality with extremely low latency, making it ideal for real-time applications. It uses a finite scalar quantization approach for streaming synthesis and supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control. It outperforms many commercial TTS systems in subjective evaluations.

Mawallafi::
Alibaba (Tongyi Lab)
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko, fr, de, it, es
QShortcut:
QDialogButtonBox
StreamingZero-shot cloningCross-lingualEmotion controlHuman-parity
Mafi kyawun ga:: Real-time applications, streaming TTS, voice assistants

Dia TTSDia TTS

@ action

Dia by Nari Labs is a 1.6B parameter text-to-speech model designed specifically for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia is perfect for creating podcast-style content, audiobook dialogues, and interactive conversational AI.

Mawallafi::
Nari Labs
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
QShortcut:
QFileDialog
Multi-speakerDialog generationNatural turn-takingEmotional expression1.6B parameters
Mafi kyawun ga:: Podcasts, audiobook dialogues, conversational content

Parler TTSParler TTS

@ action

Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.

Mawallafi::
Hugging Face
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
QShortcut:
QFileDialog
Voice descriptionNatural language controlFlexible voice creationNo preset voices needed
Mafi kyawun ga:: Creative applications where you need custom voice characteristics

GLM-TTSGLM-TTS

@ action

GLM-TTS by Zhipu AI is a text-to-speech system built on the Llama architecture with flow matching. It achieves the lowest character error rate among open-source TTS models, meaning it produces the most accurate pronunciation. GLM-TTS supports English and Chinese with voice cloning from 3-10 second audio samples.

Mawallafi::
Zhipu AI
Lasisi::
GLM-4 License
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh
QShortcut:
QDialogButtonBox
Lowest error rateVoice cloningFlow matchingNatural prosody
Mafi kyawun ga:: Applications requiring maximum pronunciation accuracy

IndexTTS-2IndexTTS-2

@ action

IndexTTS-2 is an advanced text-to-speech system that excels at zero-shot voice synthesis with fine-grained emotion control. It can generate speech with specific emotional tones like happy, sad, angry, or fearful without requiring emotion-specific training data. The model uses emotion vectors to precisely control the emotional expression of generated speech.

Mawallafi::
Index Team
Lasisi::
Bilibili Model License
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh
QShortcut:
QDialogButtonBox
Emotion controlZero-shotEmotion vectorsExpressive speechFine-grained control
Mafi kyawun ga:: Emotionally expressive content, audiobooks, virtual assistants

Spark TTSSpark TTS

@ action

Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.

Mawallafi::
SparkAudio
Lasisi::
CC BY-NC-SA 4.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh
QShortcut:
QDialogButtonBox
Voice cloningEmotion controlStyle controlPrompt-based5-second cloning
Mafi kyawun ga:: Content creation with cloned voices and emotional control

GPT-SoVITSGPT-SoVITS

@ action

GPT-SoVITS combines GPT-style language modeling with SoVITS (Singing Voice Inference via Translation and Synthesis) for powerful few-shot voice cloning. With as little as 5 seconds of reference audio, it can accurately clone a voice and generate new speech while preserving the speaker's unique characteristics. It excels at both speaking and singing voice synthesis.

Mawallafi::
RVC-Boss
Lasisi::
MIT
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko
QShortcut:
QDialogButtonBox
5-second cloningSinging voiceFew-shot learningHigh fidelityCross-lingual
Mafi kyawun ga:: Voice cloning, singing synthesis, content creator voice replication

OrpheusOrpheus

@ action

Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.

Mawallafi::
Canopy Labs
Lasisi::
Llama 3.2 Community
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
QShortcut:
QFileDialog
Human-level emotion100K hours trainingNatural emphasisExpressive speech
Mafi kyawun ga:: High-quality emotional speech, audiobooks, voice acting

Qwen3 TTSQwen3 TTS

@ action

Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports three modes: preset voices with emotion control (9 speakers), voice cloning from just 3 seconds of audio, and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.

Mawallafi::
Alibaba (Qwen)
Lasisi::
Apache 2.0
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko, de, fr, ru, pt, es, it
QShortcut:
QDialogButtonBox
Voice cloning9 preset voicesVoice design from textEmotion control10 languages
Mafi kyawun ga:: Multilingual content with voice cloning or custom voice design

ChatterboxChatterbox

PremiumLanguage

Chatterbox by Resemble AI is a cutting-edge zero-shot voice cloning model. It can replicate any voice from a single audio sample with remarkable accuracy, capturing not just the timbre but also the speaking style and emotional nuances. Chatterbox also features fine-grained emotion control, allowing you to adjust the emotional tone of the generated speech independently from the voice identity.

Mawallafi::
Resemble AI
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
QShortcut:
QDialogButtonBox
VRAM:
4GB
@ action:
4x
Zero-shot cloningEmotion controlHigh fidelityStyle transferSingle sample cloning
Mafi kyawun ga:: Professional voice cloning with emotional control, content creation

Tortoise TTSTortoise TTS

PremiumLanguage

Tortoise TTS is an autoregressive multi-voice text-to-speech system that prioritizes audio quality over speed. It uses DALL-E-inspired architecture to generate highly natural speech with excellent prosody and speaker similarity. While slower than many alternatives, Tortoise produces some of the most realistic synthetic speech available in the open-source ecosystem.

Mawallafi::
James Betker
Lasisi::
Apache 2.0
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en
QShortcut:
QDialogButtonBox
VRAM:
8GB
@ action:
4x
Highest qualityMulti-voiceDALL-E architectureVoice cloningAutoregressive
Mafi kyawun ga:: Audiobooks, premium content, quality-first applications

StyleTTS 2StyleTTS 2

PremiumLanguage

StyleTTS 2 achieves human-level TTS synthesis by combining style diffusion with adversarial training using large speech language models. It generates the most natural sounding speech among single-speaker models, rivaling human recordings. StyleTTS 2 uses diffusion-based style modeling to capture the full range of human speech variation.

Mawallafi::
Columbia University
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en
QShortcut:
QFileDialog
VRAM:
4GB
@ action:
4x
Human-levelStyle diffusionAdversarial trainingNatural variationHigh fidelity
Mafi kyawun ga:: Studio-quality single-speaker synthesis, professional narration

OpenVoiceOpenVoice

PremiumLanguage

OpenVoice by MyShell.ai enables instant voice cloning with granular control over voice style, emotion, accent, rhythm, pauses, and intonation. It can clone a voice from a short audio clip and generate speech in multiple languages while maintaining the speaker identity. OpenVoice also functions as a voice converter, allowing real-time voice transformation.

Mawallafi::
MyShell.ai / MIT
Lasisi::
MIT
QSoftKeyManager:
Medium
QPrintPreviewDialog:
@ item Spelling dictionary:
en, zh, ja, ko, fr, de, es, it
QShortcut:
QDialogButtonBox
VRAM:
4GB
@ action:
4x
Instant cloningVoice conversionEmotion controlAccent controlMultilingual
Mafi kyawun ga:: Voice cloning with fine-grained style control, voice conversion

Sesame CSMSesame CSM

PremiumLanguage

Sesame CSM (Conversational Speech Model) is a 1 billion parameter model designed specifically for generating conversational speech. It models the natural patterns of human conversation including turn-taking timing, backchannel responses, emotional reactions, and conversational flow. CSM generates audio that sounds like a natural human conversation rather than synthetic speech.

Mawallafi::
Sesame
Lasisi::
Apache 2.0
QSoftKeyManager:
Slow
QPrintPreviewDialog:
@ item Spelling dictionary:
en
QShortcut:
QFileDialog
VRAM:
8GB
@ action:
4x
ConversationalNatural timingTurn-takingBackchannel1B parameters
Mafi kyawun ga:: AI assistants, chatbots, conversational AI applications

QPrintPreviewDialog

@ action Mawallafi: Dakata QPrintPreviewDialog QSoftKeyManager @ item Spelling dictionary QShortcut VRAM Lasisi: QFileDialog
Kokoro Hexgrad Free Fast 11 1.5GB Apache 2.0 QDialogButtonBox @ action
Piper Rhasspy Free Fast 31 0 (CPU only) MIT QDialogButtonBox @ action
VITS Jaehyeon Kim et al. Free Fast 4 1GB MIT QDialogButtonBox @ action
MeloTTS MyShell.ai Free Fast 6 0.5GB (GPU optional) MIT QDialogButtonBox @ action
Bark Suno Standard Slow 13 5GB MIT 2 @ action
Bark Small Suno Standard Medium 13 2GB MIT 2 @ action
CosyVoice 2 Alibaba (Tongyi Lab) Standard Medium 8 4GB Apache 2.0 2 @ action
Dia TTS Nari Labs Standard Medium 1 4GB Apache 2.0 2 @ action
Parler TTS Hugging Face Standard Medium 1 4GB Apache 2.0 2 @ action
GLM-TTS Zhipu AI Standard Medium 2 4GB GLM-4 License 2 @ action
IndexTTS-2 Index Team Standard Medium 2 4GB Bilibili Model License 2 @ action
Spark TTS SparkAudio Standard Medium 2 4GB CC BY-NC-SA 4.0 2 @ action
GPT-SoVITS RVC-Boss Standard Slow 4 6GB MIT 2 @ action
Orpheus Canopy Labs Standard Medium 1 4GB Llama 3.2 Community 2 @ action
Chatterbox Resemble AI Premium Medium 1 4GB MIT 4 @ action
Tortoise TTS James Betker Premium Slow 1 8GB Apache 2.0 4 @ action
StyleTTS 2 Columbia University Premium Medium 1 4GB MIT 4 @ action
OpenVoice MyShell.ai / MIT Premium Medium 8 4GB MIT 4 @ action
Qwen3 TTS Alibaba (Qwen) Standard Medium 10 7GB Apache 2.0 2 @ action
Sesame CSM Sesame Premium Slow 1 8GB Apache 2.0 4 @ action
Kitten TTS KittenML Free Fast 1 0GB Apache 2.0 QDialogButtonBox @ action

Plagifar AI na rubutu zuwa magana mai zurfi

Me yasa za a zabi TTS.ai don rubutu zuwa magana?

TTS.ai na kawo mafi kyawun ma'aunin rubutu zuwa magana na duniya a cikin ɗaya, mai sauƙin amfani da dandamali. Ban da sabis na mallaka waɗanda ke kulle ku cikin injin magana ɗaya, TTS.ai yana ba ku damar samun damar 20 + ma'aunin daga manyan masana'antu na bincike ciki har da Coqui, MyShell, Amphion, NVIDIA, Suno, HuggingFace, Jami'ar Tsinghua, da sauransu.

Duk wani nau'i na ma'ana mai bude a karkashin MIT, Apache 2.0, ko kuma wasu lasisi masu daidaituwa, suna tabbatar maka da cewa kana da cikakkiyar hakkin kasuwanci don amfani da sauti da aka samar a cikin ayyukanka. Ko da kana buƙatar sauri, mai sauƙin haɗawa don aikace-aikacen lokaci-da-lokaci ko kuma fitarwa mai ingancin studio don littattafai masu sauti da podcasts, TTS.ai yana da daidaitaccen nau'i don kowane amfani da yanayin.

QDialogButtonBox

@ title: window

GPU-Speed Processing

Dukkan siffofin TTS suna tafiya akan NVIDIA GPUs masu dacewa don lokaci mai sauri, mai daidaituwa. Siffofin kyauta suna samar da sauti a cikin sakan 2. Siffofin ƙa'ida kamar Kokoro, CosyVoice 2, da Bark suna da tsawo na sakan 3-5. Siffofin Premium da mafi kyawun inganci, kamar Tortoise da Chatterbox, suna aiki a cikin sakan 5-15 dangane da tsawon rubutu.

@ item Spelling dictionary

Yi magana cikin harsuna sama da 30 ciki har da Ingilishi, Spanish, Faransanci, Jamus, Italiyanci, Portuguese, Sin, Jaapanci, Korean, Larabci, Hindi, Rasha, da kuma da yawa. Wasu nau'ikan goyon baya na haɗin harsuna, wanda ke nufin za ka iya yin magana cikin harshe da sauti na asali ba a koya shi ba. CosyVoice 2 da GPT-SoVITS suna da kyau a cikin kwaikwayon sauti na harsuna.

Developer-ready API

Integration TTS.ai cikin aikace-aikacenku tare da OpenAI-compatible REST API. One endpoint for all 20+ models. Python, JavaScript, cURL, da Go SDKs. Streaming goyon baya ga aikace-aikacen lokaci na gaskiya. Batch processing for large-scale content generation. Webhooks for async notifications. Available on Pro and Enterprise plans.

Tambayar da ake yi da yawa

Text to Speech (TTS) wata fasahar AI ce wadda take canja rubutun da aka rubuta zuwa sauti da ake magana da shi. Nau'ukan TTS na zamani kamar su Kokoro, Chatterbox, da CosyVoice 2 suna amfani da koyon zurfi don samar da magana da ke ji kamar mutum, tare da prosody na dabi'a, jin daɗi, da sauri.

Yana dogara da bukatunka. Don gani na gaba mai sauri, amfani da Piper ko MeloTTS (farashin, mai sauri). Don inganci mai girma, gwada Kokoro ko CosyVoice 2 (maimaita daraja). Don kwaikwayon magana, amfani da Chatterbox ko GPT-SoVITS (premium). Don abun cikin tattaunawa/podcast, gwada Dia TTS. Duk wani nau'i yana da ƙarfi daban-daban — yi gwaji don gano mafi kyawun daidaitawa.

Ya! TTS.ai yana ba da rubutu-zuwa- magana kyauta tare da Kokoro, Piper, VITS, da MeloTTS. Babu asusun da ake buƙata har zuwa 500 characters da 3 generations a kowace sa'a. Yi rajista don asusun kyauta don samun 15,000 characters da damar duk samfuran.

Our TTS models hadin gwiwa goyon baya 30 + harsuna ciki har da Ingilishi, Spanish, Faransanci, German, Italiyanci, Portuguese, Sin, Japan, Korean, Larabci, Rasha, Hindi, da kuma da yawa fiye da.

Na'am, za'a iya amfani da sauti da aka samar ta hanyar TTS.ai a cikin kasuwanci. Dukkan ma'auninmu suna amfani da lasisi masu sauki (MIT, Apache 2.0). Bincika lasisi na ma'aunin daban-daban don sharuɗɗan musamman. Muna shawartar ka ka duba lasisi na ma'aunin da kake amfani da shi ga shirinka.

TTS.ai yana goyon bayan sifofin fitarwa na MP3, WAV, OGG, da FLAC. MP3 shine sifar gabaɗaya don wasa da yanar gizo. An shawarci WAV don ci gaba da sarrafa sauti. Za ka iya canja tsakanin sifofin ta amfani da kayan aikinmu na Mai Sauya Sauya.

@ info: status

Masu amfani da kyauta za su iya samar da har zuwa haruffa 500 a kowace tambaya. Masu amfani da rajista za su samu har zuwa haruffa 5,000 a kowace tambaya. Ga rubutun da ya fi tsawo, za a samar da sauti cikin ɓangaren kuma a haɗa su atomatik. Masu amfani da API za su iya sarrafa har zuwa haruffa 10,000 a kowace tambaya.

@ action

Na'am, mafi yawan ma'aurata suna goyon bayan daidaita gudu daga 0.5x zuwa 2.0x. Wasu ma'aurata kamar Bark da Parler kuma suna goyon bayan kula da tsawo da salo. Za ka iya daidaita paramita na gudu a cikin fanel na ƙayyadaddun ƙayyadaddun ko ta hanyar paramita na gudun API.

Na'am, ana iya aiwatar da maganganun da yawa ta hanyar API ɗinmu. Za ka iya gabatar da maganganun da yawa cikin kira ɗaya na API ko kuma rubutu, kuma za'a aiwatar da kowanne kuma a mayar da shi kamar fayilolin sauti daban-daban. Wannan yana da kyau ga sassa na littattafan sauti, maɓallu na e-learning, ko kuma rubutun zauren muhawara na wasa.

Create an API key from your account dashboard, then send POST requests to our REST API endpoint with your text, model, and voice parameters. We provide code examples in Python, JavaScript, and cURL. The API is OpenAI-compatible, so existing integration work with minimal changes.
5.0/5 (2)

@ info

@ action

Haɗa dubban masu ƙirƙira ta amfani da TTS.ai. Ka sami 15,000 free characters tare da wani sabon asusun. Free models available without signup.