ጽሑፍ ናብ ቃላት

ጽሑፍ ናብ ድምጺ-ኣካል ንምቕያር ምስ open-source AI models. Free to use, no account required.

ኣብ ቋንቋኻ ድምጺ TTS ኣይንረክብን እየን። ንገሊኡ ድምጺ ረክብ! ድምጺካ ኺሕልዉ
ምዝገባ 5,000 ካራቴግራም

SSML tags ሒዝካ ጽሑፍካ ሒዝካ ንምውሳድ:

<speak><prosody rate="slow">Slow speech</prosody></speak>

ስነ-ሓሳብ ንምዕቃብ ስነ-ሓሳብ ንምዕቃብ (ምኽኻብ ሞዴል ይርዳእ)

ድምጺ ተለፎን (ቓል = ድምጺ)

-12 +12
0.5x 2.0x
ነጻ ምስ Piper, VITS, MeloTTS
እቲ ዝተፈጠረ ድምጺ እዚኣ ይርከብ። ሓደ ሞዴል ምረጽ፣ ጽሑፍ ኣትሒዝካ፣ ንተፈጥሮ ጠቅልል።
ድምጺ ተኸፊሉ
0:00
ድምጺ መዝጊብ .srt መዝገብ 24 ሰዓታት
TTS.ai ትወዳደሮ? ንፍቓድካ ንገረኒ!

ዝርዝር ሓበሬታ

Kokoro

Kokoro

Free

Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.

ደራሲ: Hexgrad
ውልቀ-መዚ: Apache 2.0
ፍጥነት Fast
ጥራሕ:
ቋንቋታት 8 ቋንቋታት
VRAM 1.5GB
ድምጺ ኣይተደገፈን
ባህሪያት:
82M parameters Ultra-fast Expressive voices Multilingual Streaming support
ምርኣይ:: High-quality TTS with minimal latency, streaming applications

ምክልኻል ንምርካብ

  • ልክዕን ጥንታዊን መደብ ኣርእስቲ ንምጥቃም ተጠቒምካ
  • ቍጽሪን ኣርእስቲን ንምፍታሕ ቍጽሪን ኣርእስቲን ጻሕፍቲ
  • ኮምታት ሒዝካ ኣብ መንጎ ቃላት ዝንቡር ዑደት ንምፍጣር
  • ሰለስተ ነጥብታት (...) ንኸጥቀም ንደሊ
  • Kokoro ወይ CosyVoice 2 ንኸተጥቀመሉ ንጥቀመሉ
  • Dia ንኸምዚ ዝስዕብ መተካእታ ቃላትን መተካእታ podcastን ተጠቒምካ

ቍጽሪ ኣርእስቲ

ቍጽሪ ዋጋ 1K ኣሃዝ
ነጻ 1:1 (ብነጻ)
ቍጽሪ 2x ኣርእስቲ
ተለቪዥን 4x ጽሑፋት

AI Text to Speech ብኸመይ ከም ዝሰርሕ

ኣብ ሰለስተ ቀላል ጕዕዞታት፡ ብጥበብ-ጥበብ ዝፍጠር ድምጺ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ ኣትክልቲ

1ይ ደረጃ

ጽሑፍካ ኣትሒዝካ

ጻሕፍቲ ንምጽሓፍ ዝደሊ ጽሑፍ ጻሕፍ፣ ጻሕፍ፣ ወይ ኣቕረብ። ክሳብ 5,000 ኣርእስቲ ኣብ ሓደ ጅነሬሽን ንምዕባይ ይሕግዝ። ጻሕፍቲ ቐላል ተጠቒምካ ወይ SSML tags ረኺብካ ንምዕባይ፡ ኣብ ድምጺ፣ ዑቕባ፣ ወይ ኣብ ድምጺ ምጥቃስ ላዕለዋይ ቁጥጥር ግበር።

ምዕራፍ 2

ሞዴል & ድምጽን ምረጽ

ካብ 20+ AI ሞዴላት ኣብ 3 ደረጃታት ምርኣይ. ድምጺ ይምረጽ እቲ ምስቲ ርክብካ ዝስዕብ, ቋንቋ ርክብካ ምርኣይ, ፍጥነት መጻኢ መጻኢ ካብ 0.5x ናብ 2.0x ምቕያር, ከምኡ'ውን ናይ ምርጫኻ ቅርጺ ምርኣይ (MP3, WAV, OGG, or FLAC) ምርኣይ.

ምዕራፍ 3

ምዝራብ

ኣብ መሰረታዊ መተግበሪያታት ዩኒኮድ፡ ድምጺ ኣብ ማይክሮፎን ይምልከት። ኣብ ማይክሮፎን ይምልከት። ኣብ ማይክሮፎን ይምልከት። ኣብ ማይክሮፎን ይምልከት። ኣብ ማይክሮፎን ይምልከት። ኣብ ማይክሮፎን ይምልከት። ኣብ ማይክሮፎን ይምልከት።

ጽሑፍ ናብ ቃላት

እቲ ብAI ዝዓበየ ጽሑፍ-እቲ-ብቓል ዝጥቀመሉ መሳርሒ፡ ነቲ ሰባት ኣብ ሰለስተ ፋብሪካታት ዝፍጠሩዎ፣ ዝጥቀሙዎ፣ ከምኡ'ውን ምስ ድምጺ ዝዛረቡዎ መልክዕ ይቕይር ኣሎ።

መዝገበ-ቃላት

TTS.ai

KokoroKokoro

Free

Kokoro ሓደ 82 ሚልዮን ፐራሜትሪ ጽሑፍ-እቲ-ብ-ቓል-ብ-ቓል ሞዴል እዩ እቲ ኣብ ልዕሊ ክብደት ክላሱ ዝሰርሕ። ኣብ ልዕሊ ንእሽቶ ክብደትኡ፡ ብቐሊሉ ንጹር ቓል ይፍጠር። Kokoro ቋንቋታት ብዛዕባ ኣንግሊዝኛ፣ ጃፓንኛ፣ ቻይናዊ፣ን ኮሪያን ብምእታው ብብዙሕ ቃላት ይገልጽ። ብቐሊሉ ይሰርሕ - ድምጺ ብቐሊሉ 100x ይፍጠር ካብቲ ኣብ GPU ዝግበር ራእይ-ብ-ርእይ ድምጺ ዝኸውን።

ደራሲ::
Hexgrad
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, ja, zh, fr, it, pt, es, hi
VRAM:
1.5GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
ነጻ
82M parameters ፈጣን ቃላት ቛንቋታት ደገፍ ስትሪሚንግ
ምርኣይ:: ናይ ላቴንሲ ዝለዓለ ጥራሕ ዘለዎ TTS፣ ስትሪምንግ ፕሮግራማት

PiperPiper

Free

Piper ሓደ ቀላል ጽሑፍ-እቲ-ብ-ቓል-መኪና እዩ፣ ብRhasspy ዝተፈጠረ፣ VITSን larynxን ንድፊታት ዝጥቀመሉ። ኣብ CPU ሙሉእ ብምሉእ ይሰርሕ፣ እዚ ድማ ን Edge devices፣ Home automation፣ንተወሳኺ TTS ዝደልይዎ ኣፕሊኬሽናት ጠቃሚ ገይሩዎ ኣሎ። ምስ 100+ ቃላት ኣብ 30+ ቛንቋታት፣ Piper ኣብ Raspberry Pi 4 እውን ብቐጥታ ድምጺ ስነ-ኣእምሮኣዊ ቃላት ይቕበል።

ደራሲ::
Rhasspy
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, de, fr, es, it, pt, nl, pl, ru, zh, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi, ca, cy, fa, lv, sl, lb, eu, id, ku, ml, sq, te, ur
VRAM:
0 (CPU only)
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
ነጻ
CPU-friendly ኣልጀርስ ድምጺ 35+ ቋንቋታት SSML ደገፍ
ምርኣይ:: ናጽነት፣ ምትግባር፣ንሕማቕ ፕሮግራማት

VITSVITS

Free

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) ሓደ ተመሳሳሊ ዘረባ ናይ TTS እዩ፣ እቲ ዘረባ ድማ ካብቲ ኣብ እዋን’ዚ ዝርከብ ናይ ክልተ ደረጃ ሞዴላት ዝያዳ ድምጺ ስነ-ኣእምሮኣዊ ዝዀነ ድምጺ ይፍጠር።

ደራሲ::
Jaehyeon Kim et al.
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, de, es, fr, pt, nl, fi, hu, bg, ja, pl
VRAM:
1GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
ነጻ
መጨረሻ-ወደ-መጨረሻ ስነ-ጽሑፍ ምትእስሳር ፈጣን ሰለስተ ተዛረብቲ
ምርኣይ:: ቐዳምነት-ኣለዉዋ ጽሑፍ-እቲ-ቓል-ብ-ኣካል

MeloTTSMeloTTS

Free

MeloTTS by MyShell.ai እዩ ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋይ ቋንቋ TTS ላዕለዋ

ደራሲ::
MyShell.ai
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, es, fr, zh, ja, ko
VRAM:
0.5GB (GPU optional)
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
ነጻ
CPU-optimized ቛንቋታት ቍጽሪ ኣርእስቲ ምር ዝንቡር ላታንሲ
ምርኣይ:: ምርካብ ፕሮጀክቶች በብግዜኡ, በብቛንቋ TTS

BarkBark

Standard

Bark by Suno ሓደ ኣብ ትራንስፎርመር ዝተመርኮሰ ጽሑፍ-እቲ-ኣዲኡ ሞዴል እዩ፣ እቲ ዝፍጠር ሓሳባት ድማ ብቛንቋታት ሒዙ እዩ፣ ከምኡውን ካልእ ድምጺ ከም ሙዚቃ፣ ድምጺ ደገ፣ንድምጺ ተግባርን ይርኢ እዩ። ሓሳባት ዘይተዛረቡ ከም ምጽዋር፣ ምጽዋር፣ንጸልማትን ይርኢ እዩ። Bark ኣብ ልዕሊ 100 ድምጺ ድምጺ 13+ ቋንቋታት ይደግፍ።

ደራሲ::
Suno
ውልቀ-መዚ::
MIT
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
5GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
ድምጺ ምጽዋት/ምቕራብ ድምጺ 100+ ተዛረብቲ ቛንቋታት
ምርኣይ:: ድምጺ ፅሑፋዊ፣ ድምጺ መጽናዕቲ ምስ ስነ-ልቦና፣ ድምጺ ፅሑፋዊ

Bark SmallBark Small

Standard

Bark Small ሓደ ተለፎን ናይቲ Bark ሞዴል እዩ እቲ ዝዓበየ ጥራሕ ድምጺ ንምርካብ ዝዓበየ ፍጥነት ምትእስሳር ዝህብ ከምኡውን ዝዝከር ምትእስሳር ዝህብ። ብቐንዱ ድማ ብቕዓት Bark ንምፍጣር ቃላት ምስ ስነ-ልቦና፣ ፊንፊንፊን፣ንብዙሕ ቋንቋታት ዝያዳ ይጥቀመሉ እዩ።

ደራሲ::
Suno
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
VRAM:
2GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
ቀላል ፈጣን ካብ ሙሉእ ባርክ ቃለ-መጠይቕ ቛንቋታት
ምርኣይ:: ድምጺ ፈጠራ ሒዙ ክመጽእ ከሎ እቲ ሙሉእ በርክ ብዙሕ ኣይስዕብን እዩ

CosyVoice 2CosyVoice 2

Standard

ድምጺ 2 ካብ ቶንጊ ላብ ኣሊባባ፡ ብጥበብ ድምጺ ዝስዕብ ጥራሕ ድምጺ ዝህብ፡ ብጥበብ ድምጺ ዝስዕብ ጥራሕ ድምጺ ዝህብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ ድምጺ ዝስዕብ፡ ብጥበብ

ደራሲ::
Alibaba (Tongyi Lab)
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, fr, de, it, es
VRAM:
4GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
መራኸቢ ክሎኒንግ Zero-shot ቋንቋ ስነ-ኣእምሮ ፍልልይ ሰብኣዊ
ምርኣይ:: ረዚን-ታይም ኣፕሊኬሽናት፣ ስትሪም TTS፣ ድምጺ ረዳት

Dia TTSDia TTS

Standard

Dia by Nari Labs እዩ 1.6B parameter text-to-speech model specially designed for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia እዩ ሙሉእ ንምፍጣር podcast-style content, audiobook dialogues, and interactive conversational AI.

ደራሲ::
Nari Labs
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
VRAM:
4GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
ሰለስተ-ተዛማዲ መተካእታ ምትካል ስነ-ሓሳብ 1.6B ርኢቶታት
ምርኣይ:: ፎድካስት፣ ቃለመጠይቕ ኣብ መጽሓፎም፣ ርክብ

Parler TTSParler TTS

Standard

Parler TTS ሓደ ጽሑፍ-እቲ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ውሑድ-ብ-ው

ደራሲ::
Hugging Face
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
VRAM:
4GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
መግለጺ ድምጽ ቋንቋ ስነ-ኣእምሮ ድምጺ ምፍጣር ድምጺ ዘይተቐመጠ የድልየኩም
ምርኣይ:: ፍልጠታዊ ኣፕሊኬሽናት ኣብ ዝለዓለ ድምጺ ዝያዳ ባህርያት ዝደልይዎ

Indic Parler TTSIndic Parler TTS

Standard

ንድዊክ ፓርለር TTS ብ AI4Bharat ን ንድዊክ ፓርለር ኣርክቴክቸር ናብ ንድዊክ ቋንቋታት ይኸፍተሉ፣ ኣብ ታይምሊ፣ ባንጋሊ፣ ማርታሽ፣ ጉጅለራ፣ ካናዳ፣ ፑንጃቢ፣ ኦዲያ፣ ኣልማሽ፣ ሃንዲ፣ ቴሉጉ፣ ማራሊማላ፣ ኣንግሊዝኛን ይነድፍ። ከም ንድዊክ ፓርለር፣ ነቲ ትደሊ ድምጺ ብቛንቋ ደንቢኻ ትገልጽ፡ እቲ ሞዴል ድማ ምስኡ ይመሳሰል። — ድምጺታት ዘይተቐመጡ ዝግባእ። ኣብ AI4Bharat ድምጺ ኮርፖሬሽንታት ንምብጻሕ ናጽነት ድምጺን ድምጺ ድምጺን ኣብ መላእ ንድዊክ ምድረ በዳ ኣመሪካ ዝተማህረ።

ደራሲ::
AI4Bharat
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
ta, bn, mr, gu, kn, pa, or, as, hi, te, ml, en
VRAM:
8GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
ቋንቋታት ህንድን መግለጺ ድምጽ ቋንቋ ስነ-ኣእምሮ ስነ ጽሑፍ ህንድ
ምርኣይ:: ድምጺ ብቛንቋ ህንድን፣ ዞባዊ መዘውርታት፣ ቛንቋታት ህንድን

KhanomTan TTSKhanomTan TTS

Standard

KhanomTan TTS ሓደ ነጻ ታይላንድዊ ጽሑፍ-እቲ-ብቓል-ብቓል ሞዴል እዩ፣ ኣብ YourTTS ቛንቋ-ብዙሕ-ኣርክቴክቸር ዝተሰርሐ። ኣብ CC0 ዝተማህረ፣ ኣብ ታይላንድዊ ኮርፖሬሽናት (TSync) ብቐሊል-ምኽራይ ዝረኸበ፣ ምስ ካልኦት ቋንቋታት ድማ፡ ንጹር ታይላንድዊ ቃላትን ድምጺታትን ይህብ። ሓደ ንጹር፣ ኣብ ንግዲ ዝውሰድ ምርጫ ንታይላንድኛ — እቲ ቋንቋ እቲ ነጻ TTS ሞዴላት ኣብ ዘይኮማርያዊ ውልቀ-መዚታት ጥራይ ይጥቀሙ።

ደራሲ::
Wannaphong Phatthiyaphaibun
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
th
VRAM:
2GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
ታይላንድኛ ሰለስተ ተዛረብቲ YourTTS architecture የንግድ-ነጻ ውልቀ-መዚ
ምርኣይ:: ታይላንድኛ ድምጺ-ኣውጻእ፣ ታይላንድኛ-ቛንቋን-ምስልን-ኣፕሊኬሽናት

IndexTTS-2IndexTTS-2

Standard

IndexTTS-2 ሓደ ኣድላይነት ዘለዎ ፅሑፍ-እቲ-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓል-ብ-ቓ

ደራሲ::
Index Team
ውልቀ-መዚ::
Bilibili Model License
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh
VRAM:
4GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
ስነ-ኣእምሮ ጻዕዳ ስነ-ሓሳብ ቃለ-መጠይቕ ቍጽሪ
ምርኣይ:: ንጥፈታት ብስነ-ሓሳብ ዝፍለጥ፣ መጽናዕቲ ድምጺ፣ ቪርቱላዊ ረዳት

Spark TTSSpark TTS

Standard

Spark TTS ብ SparkAudio ዝርከብ ቴክስት-ወደ-ቃል ሞዴል እዩ፣ እቲ ድምጺ ክሎን ዝገብር፡ ምስ ስነ-ሓሳብ ዝውክል፡ ከምኡ’ውን ስነ-ሓሳብ ዝዛረብ፡ ዝዓበየ ስነ-ሓሳብ ዝህብ እዩ። ብ5 ሰከንዶታት ረቛሒ ድምጺ ጥራይ ብምጥቃም፡ ድምጺ ክሎን ክገብር ይኽእል እዩ። እቲ ድምጺ ክሎን ምስ ስነ-ሓሳብ፣ በብግዜኡን ብባህሪኡን ዝለዓለ ስነ-ሓሳብ ክፈጥር ይኽእል እዩ። Spark TTS ድማ፡ ብተደጋጋሚ ዝግበር ምትእስሳር ክተጥቀመሉ ይኽእል እዩ።

ደራሲ::
SparkAudio
ውልቀ-መዚ::
CC BY-NC-SA 4.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh
VRAM:
4GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
ድምጺ ስነ-ኣእምሮ ቍጽሪ ረድኤት 5 ሰከንድ ክሎኒንግ
ምርኣይ:: ምጽናሕ ርክብ ምስ ድምጺ ክሎን እናተኻየደን ብስነ-ኣእምሮ ዝውዳእ

GPT-SoVITSGPT-SoVITS

Standard

GPT-SoVITS፡ GPT-style ቋንቋ ሞዴሊንግን SoVITS (Singing Voice Inference via Translation and Synthesis)ን ንምጥቃም ድምጺ ክሎን ክገብር ይኽእል እዩ። ብ5 ሰከንዶታት ናይ ሬፈረንስ ድምጺ፡ ብቐሊሉ ድምጺ ክሎን ክገብር ይኽእል እዩ። ኣብ ስነ-ጽሑፍን ስነ-ግጥምን ድማ ክጥቀመሉ ይኽእል እዩ።

ደራሲ::
RVC-Boss
ውልቀ-መዚ::
MIT
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko
VRAM:
6GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
5 ሰከንድ ክሎኒንግ ድምጺ ዝዛረብ ጥቕሲ ድምጺ ዝደፍአ ቋንቋ
ምርኣይ:: ክሎኒንግ ቃላት፣ መዝሙር ሰንሰለት፣ ድምጺ ደራሲ ርክብ

OrpheusOrpheus

Standard

Orpheus ሓደ ኣብ ዓብዪ መጠን ዝርከብ ጽሑፍ-እቲ-ብ-ቓል ዝርከብ ሞዴል እዩ፣ እቲ ኣብ ደረጃ ሰብኣዊ ስነ-ሓሳብ ዝርከብ ስነ-ሓሳብ ዝረክብ። ኣብ ልዕሊ 100,000 ሰዓታት ናይ ስነ-ሓሳብ ሓበሬታ ዝተማህረ፣ ኣብ ስነ-ሓሳብ ንምፍጣር ብባህላዊ ስነ-ሓሳብ፣ ብተደጋጋሚ ዝግበር ሓሳባትን ስነ-ሓሳብ ንምፍጣር ዝጥቀመሉ ስነ-ሓሳብ እዩ ዚመርጽ። Orpheus ስነ-ሓሳብ ንምፍጣር ዝኽእለሉ ስነ-ሓሳብ እዩ፣ እቲ ስነ-ሓሳብ ካብ ሰብኣዊ መዝሙራት ዘይተለይን እዩ።

ደራሲ::
Canopy Labs
ውልቀ-መዚ::
Llama 3.2 Community
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
VRAM:
4GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
ስነ-ኣእምሮኣዊ-ደረጃ 100K ሰዓታት ስልጠና ስነ-ጥበባዊ ኣተኩሮ ቃለ-መጠይቕ
ምርኣይ:: ስነ-ጽሑፍ ስነ-ኣእምሮ፣ መጽናዕቲ ድምጺ፣ ድምጺ

ChatterboxChatterbox

Premium

Chatterbox by Resemble AI ሓደ ናይ ዕለታዊ ተግባር ናይ ድምጺ ክሎኒንግ ሞዴል እዩ። ካብ ሓደ ድምጺ ምሳሌ ምስ ኣዝዩ ርግኣት፡ ድምጺ ንክመልስ ይኽእል እዩ። ድምጺ ንክመልስ ዘይኰነስ፡ ድምጺ ንክረክብን ስነ-ጥበብ ንክገልጽን ስነ-ልቦናዊ ፍልልያት ንክረክብን ይኽእል እዩ። Chatterbox እውን ስነ-ልቦናዊ ፍልልያት ንክመርሕ ይኽእል እዩ።

ደራሲ::
Resemble AI
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
VRAM:
4GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
4x
ክሎኒንግ Zero-shot ስነ-ኣእምሮ ድምጺ ዝደፍአ ቍጽሪ ሰለስተ ምሳሌ ክሎኒንግ
ምርኣይ:: ስነ-ጥበባዊ ድምጺ ክሎን ምስ ስነ-ልቦናዊ ቁጥጥር፣ ፅሑፍ

Tortoise TTSTortoise TTS

Premium

Tortoise TTS ሓደ ብቐንዱ ዝሕልወሉ ናይ ድምጺ-ብዙሕ ጽሑፍ-እቲ-ብቓል-ብቓል-ሲስተም እዩ እቲ ድምጺ-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-ብቓል-

ደራሲ::
James Betker
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en
VRAM:
8GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
4x
ብቕዓት ድምጺ-ብዙሕ ኣርክቴክቸር DALL-E ድምጺ ኣልቦ-መቕጻዕቲ
ምርኣይ:: ድምጺ-መጽሓፍቲ፣ መሰረታዊ መዘውርታት፣ ናጽነት-ኣብ-መጀመርታ-መጥቃዕቲ

StyleTTS 2StyleTTS 2

Premium

StyleTTS 2 ብቐንዱ ኣብ ደረጃ ሰብ TTS ስነ-ጽሑፍ ብምፍጣር፡ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ

ደራሲ::
Columbia University
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
VRAM:
4GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
4x
ደረጃ ሰብ ቍጽሪ ልምምድ ፍልልይ ድምጺ ዝደፍአ
ምርኣይ:: ስነ-ጽሑፍ ስነ-ጽሑፍ፡ ስነ-ጽሑፍ

OpenVoiceOpenVoice

Premium

OpenVoice ብ MyShell.ai ድምጺ ክሎን ክገብር ይኽእለ እዩ፣ ብግቡእ ንድምጺ ክቆጣጠር፣ ስነ-ሓሳብ፣ ኣተሓሳስባ፣ ሪትም፣ ዑቕባ፣ ተንተንትን. ድምጺ ከውጽእ ይኽእል እዩ፣ ካብ ሓደ ድምጺ ክሊፕ ዝረሓቐን ቃላት ኣብ ዝተፈላለዩ ቋንቋታት ክፈጥር ይኽእል እዩ፣ እቲ ዝዛረብ ግን ናቱ ናቱ እዩ ዝኸውን። OpenVoice እውን ከም ድምጺ ክተኸተል ይኽእል እዩ፣ ድምጺ ኣብ ግዜ ተግባር ክቕይር ይኽእል እዩ።

ደራሲ::
MyShell.ai / MIT
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, fr, es
VRAM:
4GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
4x
ክሎኒንግ ድምጺ ስነ-ኣእምሮ ድምጺ ቛንቋታት
ምርኣይ:: ድምጺ ክሎን ምስ ቅርጺ ድምጺ ዝለዓለ ቁጠባዊ ቁጥጥር፣ ድምጺ ክተኸተል

Qwen3 TTSQwen3 TTS

Standard

Qwen3-TTS ሓደ 1.7 ቢሊዮን ፐራሜትሪ ጽሑፍ-እቲ-ብቓል-ብቓል ሞዴል ካብ فريق Qwen of Alibaba እዩ. ክልተ ሞድታት ይደግም: ድምጺታት ዝተቐመጡ ብጸገም-መምራት (9 ተዛረብቲ),ን ሓደ ውልቀ-ድምጺ ሞድ ዲዛይን ኣብኡ እቲ ድምጺ እትደሊ ኣብ ቋንቋ ስነ-ኣእምሮኣዊ ትገልጽ። 10 ቋንቋታት ይጥቀመሉ ምስ ዝለዓለ ስነ-ኣእምሮኣዊን ስነ-ጽሑፋዊን ፍልጠት

ደራሲ::
Alibaba (Qwen)
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, de, fr, ru, pt, es, it
VRAM:
7GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
9 ቀደም ዝተረጋገጹ ቃላት ድምጺ ከጽሕፍ ስነ-ኣእምሮ ቋንቋታት
ምርኣይ:: ቛንቋታት ብዛዕባ ዘለዎም ሓሳባት ምስ ድምጺታት ወይ ምስ ድምጺታት ዝተመርጸ ዲዛይን

VieNeu-TTS-v2VieNeu-TTS-v2

Standard

VieNeu-TTS-v2 ሓደ 300M ፐራሜትር ቪያነዝ-መጀመሪያ TTS ሞዴል እዩ ኣብ 10,000+ ሰዓታት ናይ ቛንቋ ሁለት ዝተማህረ። 7 ቀጻሊ ቃላትን 3-5 ሰከንዶም ናይ ድምጺ ረቛሒን ይደግም። ኣብ CPU GGUF Q4 inference + ONNX audio decoder ይሰርሕ - GPU ኣይተደልየን፣ ወለዶታት ኣብ ~7 ሰከንዶም ይፍጸም። ኣብ Qwen3 backbone ተተኪኡ እዩ።

ደራሲ::
Phạm Nguyễn Ngọc Bảo
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
vi, en
VRAM:
CPU
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
7 ቀደም ዝተረጋገጹ ቃላት (ንጉስ + ደቡብ) ኮድ-ማሕተም En-Vi ድምጺ ክሎን (3-5s ርኢቶ) Podcast / ደገፍ ሰለስተ-ተዛማዲ CPU-only — no GPU required
ምርኣይ:: ቪዬትናሚ ንጥፈታትን ብቛንቋ 2ን 3ን ዝዛረብ en-vi ተረኽቦን

Sesame CSMSesame CSM

Premium

Sesame CSM (Conversational Speech Model) ሓደ 1 ቢልዮን ፐራሜትሪ ሞዴል እዩ፣ ብቐጥታ ንቃልን ቃላትን ንምፍጣር ዝተሰርሐ። እቲ ሞዴል፡ ንባህላዊ መልክዕ ቃላትን ቃላትን ሰብን ይመርጽ፡ ከምኡ'ውን ግዜን ሰዓትን ዑቕባታትን፣ መልሲታት ንክሻል፣ ስነ-ኣእምሮኣዊ መልሲታትን፣ ቃላትን ቃላትን ዝፍሰስን ዝፍሰስን። CSM ድማ ድምጺ ከም ሰብኣዊ ቃላት ዝኣመሰለ ድምጺ ይፍጠር፡ ልክዕ ከም ሰብኣዊ ቃላት ዘይኮነስ ከም ቃላት ስነ-ኣእምሮኣዊ።

ደራሲ::
Sesame
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en
VRAM:
8GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
4x
ቃለ-መጠይቕ ስነ-ጥበባዊ ጊዜ ዑደት ምትካል 1B parameters
ምርኣይ:: ደገፍቲ AI፣ ቻትቦቶች፣ መነጋገሪያዎች AI

Chatterbox TurboChatterbox Turbo

Standard

Chatterbox Turbo by Resemble AI ሓደ 350M parameter upgrade ን Chatterbox እዩ፣ ክሳብ 6x ርኡይ-ጊዜ ፍጥነት ምስ sub-200ms latency ይህብ። ስነ-ቛንቋዊ tags ከም [laugh], [cough], and [chuckle] ኣብ ጽሑፍ ቀጥታ ይደግም። Perth watermarking ኣብ ኩሉ ዝተፈጠረ ድምጺ ንምድላው ይውስኽ።

ደራሲ::
Resemble AI
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en
VRAM:
2GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
200ms ላታንሲ መለክዒታት 6x ርኡይ ግዜ ድምጺ ቍጽሪ
ምርኣይ:: ድምጺ-ኣጋጣሚታት ኣብ ግዜ-ርግጽ፣ ቃላት ሓሳባት ምስ ድምጺ-ኣካል

VoxCPMVoxCPM

Standard

VoxCPM 1.5 ብ OpenBMB ሓደ ናይ TTS ሞዴል እዩ እቲ ኣብ ቀጥተኛ ቦታ ዝሰርሕ ዘይኮነስ ኣብ ተለፎንታት ዝሰርሕ እዩ። 44.1kHz ድምጺ ዝረኸብ፣ ድምጺ ዝሰርሕ ካብ 3-10 ሰከንድ ዝረክብ፣ ከምኡውን ኣብ ርሑቕ ርሑቕ ዝሰርሕ እዩ። Cross-language cloning ድማ ድምጺ እንግሊዝኛ ናብ ቻይናዊ ቃላት ክትጥቀመሉ ትኽእል ኢኻ፣ ከምኡውን ብቐንዱ ድማ ብቐንዱ ድማ ብቐንዱ ድማ ብቻይናዊ ቃላት ክትጥቀመሉ ትኽእል ኢኻ።

ደራሲ::
OpenBMB
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, zh
VRAM:
4GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
ድምጺ 44.1kHz Tokenizer-free ክሎኒንግ ቋንቋታት ርክብ ረቂቕ
ምርኣይ:: ድምጺ ብጥበብ ዝፍለጥ፣ ድምጺ-መጽሓፍቲ፣ ረኺብካዮ ዘሎካ ሓበሬታ ብድምጺ ዝፍለጥ

Kani TTS 2Kani TTS 2

Free

Kani-TTS-2 ካብ NineNineSix ዝርከብ 400M ርሑቕ ርሑቕ ሞዴል እዩ፣ ኣብ Liquid AI LFM2 backbone ምስ NVIDIA NanoCodec ዝተሰርሐ። ኣብ 3GB VRAM ጥራይ ይሰርሕ፡ ኣብ A100 (RTF 0.2) ድማ ኣብ 2 ሰከንዶም ~10 ሰከንዶም ቃላት ይፍጠር። እቲ ሕጂ ኣብ መላእ ዓለም ዝርከብ ርሑቕ ርሑቕ ረኸብታ `kani-tts-2-en` ንቛንቋ እንግሊዝ ጥራይ እዩ ዝርከብ፡ እቲ ድምጺ ንምጥቃም ዝግበር ምትእስሳር ድማ ኣይፍለጥን። Chatterbox / IndexTTS2 / F5-TTS ንጥቀመሉ፣ ወይ Kokoro / MeloTTS ንኻልኦት ቋንቋታት ዝኸውን ይጥቀመሉ።

ደራሲ::
NineNineSix
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en
VRAM:
3GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
ነጻ
3GB VRAM ፈጣን ቀላል ኮዴክ ነጻ
ምርኣይ:: ፈጣን ናይ እንግሊዝኛ ምርምር ኣብ ዝናብ-VRAM መሳርሒታት፣ ፈጣን ምሳሌታት

OuteTTSOuteTTS

Free

OuteTTS ዓበይቲ ቋንቋታት ሞዴላት ምስ ጽሑፍ-እቲ-ብ-ቓል-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃሪ-መምሃ

ደራሲ::
OuteAI
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en
VRAM:
2GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
ነጻ
CPU ምትእስሳር መተግበሪያታት ቍጠባዊ ፎቶግራፍ ናይቶም ተዛረብቲ
ምርኣይ:: ዳርጋ መተካእታ, ብራውዘር-ተመስረት TTS, ዝንቡር-መዳያት ዞባታት

VibeVoiceVibeVoice

Standard

VibeVoice ካብ Microsoft፡ 90 ደቂቃታት ዝጸንሐ ቃላትን 4 ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተ

ደራሲ::
Microsoft
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, zh
VRAM:
4GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
2x
ሰለስተ-ተዛማዲ ረኺብካዮ (90 ደቂቃ) ምጽሓፍ ብሮድካስት ድምጺ ዝንቡር ላታንሲ
ምርኣይ:: ድምጺ-ብ-ድምጺ፣ ቃለ-መጠይቕ፣ ረኺብካዮ ዘሎካ ሓበሬታ፣ ሓሳባት ናይ ብዙሓት ተዛረብቲ

Pocket TTSPocket TTS

Free

Pocket TTS ካብ Kyutai (መጻሕፍቲ Moshi) ዝውሃብ 100M ርክብ-ጽሑፍ-ለ-ቃል ሞዴል እዩ፣ እቲ ዝዓበየ ድማ ኣብ CPU ይሰርሕ፣ ድምጺ ክሎን ከቢድ ኣይኰነን፣ ካብ ሓደ ድምጺ ምሳሌ ይደግም፣ ከምኡውን ድምጺ ብባህሪይ ዝሰማማዕ ይፍጠር። እቲ ዝንቡር ሞዴል መጠን፡ ን Edge Deployment (መዳያት ዳርጋ-ኣዝዩ ዝጸንሐ)ን ን Low-Resource Environments (መዳያት ዝዓበየ)ን ይመርሕ።

ደራሲ::
Kyutai
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, fr
VRAM:
1GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
ነጻ
100M parameters CPU ምትእስሳር ድምጺ ሰለስተ-ሳምፕል ክሎኒንግ ዳርጋ-ተጸበየ
ምርኣይ:: ቐላል ኣውጻእ፣ CPU-only environments, quick voice cloning

Kitten TTSKitten TTS

Free

Kitten TTS by KittenML እዩ ዝኸውን ናጻ ጽሑፍ-እቲ-ብ-ቓል-ብ-ቓል ሞዴል ዝተመስረተ ኣብ ONNX. ምስ 15M-80M ርክብ (25-80 MB ኣብ ዲስክ), ድምጺ ናይ ላዕለዋይ ጥራሕ ስነ-ጽሑፍ ኣብ CPU ይወሃብ ብዘይ GPU. 8 ድምጺታት ተዋሂቡ፣ ርዝመት ቃላት ዝቕይር፣ ናጻ ጽሑፍ ናይ ቍጽርታት፣ ኣመዛዛሚታት፣ን yunitታት. ልክዕ ን Edge Deployment and low-latency applications.

ደራሲ::
KittenML
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en
VRAM:
0GB
ድምጺ:
ኣይ
ዋጋ 1K ኣሃዝ:
ነጻ
CPU-only inference ዝያዳ ካብ 80MB መጠን ሞዴል 8 ድምጺታት መራኸቢ ርሒቕ ONNX-based 24kHz ውጽኢት
ምርኣይ:: ፈጣን ቀላል TTS, edge deployment, ዝንቡር-ላታንቲ ፕሮግራማት

CosyVoice3CosyVoice3

Standard

CosyVoice3 እቲ ናይ Alibaba's FunAudioLLM ቡድን ናይ ቅርጺ ለውጢ እዩ. ፍልልይ ናይ ድምጺ ምስ ~150ms ርዝመት፣ ብእምነት/ስፒድ/ውጽኢት ዝተመርኮሰ ቁጥጥር፣ን ተመሳሳሊነት ናይ ድምጺ ምስ zero-shot cloning ዝዓበየ እዩ ዝርከብ። 9 ቋንቋታት 18 ቻይናውያን ቋንቋታት ይደግም። RL-tuned ቅርጺ ድምጺ ድማ ናይ ጥንታዊት ድምጺ ፍልልይ ይህብ።

ደራሲ::
Alibaba (FunAudioLLM)
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, de, es, fr, it, ru
VRAM:
4GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
መራኸቢ ስነ-ኣእምሮ ድምጺ ምትካል ትእዛዝ ዝስዕብ
ምርኣይ:: ቛንቋታት ብዛዕባ ዘለዎም ምርካብ TTS, መተግበሪያታት ኣብ ግዜ ርግኣት፣ ክሎኒንግ ቃላት

NAMAA Saudi TTSNAMAA Saudi TTS

Standard

NAMAA Saudi TTS ሓደ ሱዳን-ኣረባዊ ፋይን-ቶን ናይ Resemble AI's ChatterboxMultilingual እዩ. ብ NAMAA Space ኣብ ናቱት ሱዳን-ዳሌክት ንግግር ዝተማህረ፣ ናይ ስነ-ኣእምሮ ናቱት ስቶንድርድ ሱዳንን ሱዳንን ቃለ-መጠይቕን ዝህብ እዩ፣ እቲ ናይ ሱዳን ስነ-ኣእምሮ ናቱት ስቶንድርድ ሱዳንን ቃለ-መጠይቕን ዝህብ እዩ። እቲ ናይ Chatterbox zero-shot voice cloningን emotion controlን ብተዛማዲ ድምጺ መተካእታ ምትእስሳር ይወሃብ። እቲ መጀመርታ ናይ ሱዳን TTS ኣብ TTS.ai ተተኪኡ።

ደራሲ::
NAMAA Space
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
ar
VRAM:
6GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
ሳዑዲ ዓረብኛ ቐዳማይ ኣመሪካዊ ድምጺ ክሎን ስነ-ኣእምሮ መዝሙር
ምርኣይ:: ርሑቕ

Darwin TTSDarwin TTS

Standard

Darwin-TTS-1.7B-Cross by FINAL-Bench እዩ ሓደ ናይ ምርምር ዓይነት ናይ Qwen3-TTS-1.7B 84 talker-FFN tensors (8.6%) ኣብ α=3% ምስቲ ተመሳሳሊ tensors ካብ Qwen3-1.7B-Base ዝተኣሳሰር። እቲ ዝተኣሳሰር ብተደጋጋሚ ዘይተሃድሶ ዝተሰርሐ እዩ፣ ከምኡውን ብተደጋጋሚ ዝፍለጥ ናይ ቋንቋታት ፍልልይ ናይ ድምጺ ክሎን ኣብ ኮሪያን፣ እንግሊዝን፣ ጃፓንን፣ ቻይናን ይፍጠር። ኣብ zero-shot voice-clone mode ይሰርሕ (3 ሰከንድ ረፈረንት ኦዲዮ)

ደራሲ::
FINAL-Bench
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, ko, ja, zh
VRAM:
7GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
ድምጺ ቋንቋ FFN-ተጣበቐ 4 ቋንቋታት ደገ Qwen3
ምርኣይ:: ድምጺ ክሎን ምግባር ኣብ መንጎ ቋንቋታት እንግሊዝኛ / ኮርያ / ጃፓን / ቻይናዊ ምስ ሓደ ድምጺ ረድኤት

MOSS-TTSDMOSS-TTSD

Standard

MOSS-TTSD v1.0 ካብ OpenMOSS ሓደ 7B ዳይሎግ ቴክስት-ወደ-ቓል ሞዴል እዩ እቲ ቃለ-መጠይቕ ካብ ሓደ ድምጺ ዝቑጸር ረድኤት ዝጅምር። ክሳብ 5 ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተዛማዲ ተ

ደራሲ::
OpenMOSS
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh
VRAM:
12GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
2x
ድምጺ-ብዙሕ ክሳብ 5 ተዛረብቲ 60min coherent audio ድምጺ Podcast-optimised
ምርኣይ:: Podcasts, audiobooks, dubbed dialogue, conversational content with multiple voices

Ming-Omni TTSMing-Omni TTS

Free

Ming-omni-tts-0.5B ብ inclusionAI ዝርከብ ናይ ቃላት ሞዴል ብቐጥታ ዝተሰርሐ ኣብ BailingMM ዝርከብ ናይ ደገ መስመር ምስ ሓደ Patch-by-Patch ዝርከብ ናይ ድምጺ ዲኮደር እዩ። 44.1kHz ምርካብ (ብCD ጥራሕ ዝስዕብ) ይህብ፣ ክሎን ድምጺ ካብ 3+ ሰከንድ ረድኤት ይደግም፣ ከምኡ’ውን ብ JSON ዝተሰርሐ ናይ ስነ-ልቦና / ተናጋሪ / BGM ቁጥጥር ይውስን።

ደራሲ::
inclusionAI
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh
VRAM:
3GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
ነጻ
44.1kHz ውጽኢት ድምጺ ስነ-ኣእምሮ ድምጺ ጅነሬሽን ቅርጺ
ምርኣይ:: ስነ-ጽሑፍ ብቛንቋ ክልተ ቋንቋታት፣ ድምጺ ብስነ-ስርዓት ዝውዳእ፣ ቻይናዊ ጽሑፍ

MOSS-TTS NanoMOSS-TTS Nano

Free

MOSS-TTS-Nano-100M፡ ናይ MOSS-TTS ቤተሰብ 100M-parameter ቅርጺ፡ ናይ 8B ሞዴል ረኽቢ ጥራሕ፡ ኣብ ~80x ዝንቡር ክብደት፡ ዝንቡር VRAM፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር 20-language reach፡ ዝንቡር

ደራሲ::
OpenMOSS
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, zh, de, es, fr, ja, it, ko, ru, ar, pt
VRAM:
2GB
ድምጺ:
ኣየ
ዋጋ 1K ኣሃዝ:
ነጻ
100M ንክኪ ምትእስሳር ፈጣን ቛንቋታት ድምጺ ተመሳሳሊ MOSS ቤተሰብ
ምርኣይ:: ነጻ-ደረጃ TTS, ከፍተኛ-volume ምርምር, ዝንቡል-latency ተሳትፎ

KokoroKokoro

ነጻ

Kokoro is an 82 million parameter text-to-speech model that punches well above its weight class. Despite its tiny size, it produces remarkably natural and expressive speech. Kokoro supports multiple languages including English, Japanese, Chinese, and Korean with a variety of expressive voices. It runs incredibly fast — generating audio nearly 100x faster than real-time on a GPU.

ደራሲ::
Hexgrad
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en, ja, zh, fr, it, pt, es, hi
ምርኣይ:: High-quality TTS with minimal latency, streaming applications

PiperPiper

ነጻ

Piper is a lightweight text-to-speech engine developed by Rhasspy that uses VITS and larynx architectures. It runs entirely on CPU, making it ideal for edge devices, home automation, and applications requiring offline TTS. With over 100 voices across 30+ languages, Piper delivers natural-sounding speech at real-time speeds even on a Raspberry Pi 4.

ደራሲ::
Rhasspy
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en, de, fr, es, it, pt, nl, pl, ru, zh, ar, cs, da, fi, el, hu, is, ka, kk, ne, no, ro, sk, sr, sv, sw, tr, uk, vi, ca, cy, fa, lv, sl, lb, eu, id, ku, ml, sq, te, ur
ምርኣይ:: Quick previews, accessibility, and embedded applications

VITSVITS

ነጻ

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. It adopts variational inference augmented with normalizing flows and an adversarial training process, achieving a significant improvement in naturalness.

ደራሲ::
Jaehyeon Kim et al.
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en, de, es, fr, pt, nl, fi, hu, bg, ja, pl
ምርኣይ:: General-purpose text-to-speech with natural prosody

MeloTTSMeloTTS

ነጻ

MeloTTS by MyShell.ai is a multilingual TTS library supporting English (American, British, Indian, Australian), Spanish, French, Chinese, Japanese, and Korean. It is extremely fast, processing text at near real-time speed on CPU alone. MeloTTS is designed for production use and supports both CPU and GPU inference.

ደራሲ::
MyShell.ai
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en, es, fr, zh, ja, ko
ምርኣይ:: Production applications needing fast, multilingual TTS

Kani TTS 2Kani TTS 2

ነጻ

Kani-TTS-2 by NineNineSix is an ultra-lightweight 400M parameter model built on a Liquid AI LFM2 backbone with NVIDIA NanoCodec. It runs in just 3GB VRAM and produces ~10 seconds of speech in ~2 seconds on an A100 (RTF 0.2). The current public release ships an English-only `kani-tts-2-en` checkpoint and does not expose the speaker-embedding hook needed for voice cloning — use Chatterbox / IndexTTS2 / F5-TTS for cloning, or Kokoro / MeloTTS for non-English.

ደራሲ::
NineNineSix
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en
ምርኣይ:: Fast English generation on low-VRAM hardware, quick previews

OuteTTSOuteTTS

ነጻ

OuteTTS extends large language models with text-to-speech capabilities while preserving the original architecture. It supports multiple backends including llama.cpp (CPU/GPU), Hugging Face Transformers, ExLlamaV2, VLLM, and even browser inference via Transformers.js. Features zero-shot voice cloning through speaker profiles saved as JSON.

ደራሲ::
OuteAI
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት: en
ምርኣይ:: Edge deployment, browser-based TTS, low-resource environments

Pocket TTSPocket TTS

ነጻ

Pocket TTS by Kyutai (creators of Moshi) is a compact 100M parameter text-to-speech model that punches well above its weight. It runs efficiently on CPU, supports zero-shot voice cloning from a single audio sample, and produces natural-sounding speech. The small model size makes it ideal for edge deployment and low-resource environments.

ደራሲ::
Kyutai
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en, fr
ምርኣይ:: Lightweight deployment, CPU-only environments, quick voice cloning

Kitten TTSKitten TTS

ነጻ

Kitten TTS by KittenML is an ultra-lightweight text-to-speech model built on ONNX. With variants from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU. Features 8 built-in voices, adjustable speech speed, and built-in text preprocessing for numbers, currencies, and units. Ideal for edge deployment and low-latency applications.

ደራሲ::
KittenML
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en
ምርኣይ:: Fast lightweight TTS, edge deployment, low-latency applications

Ming-Omni TTSMing-Omni TTS

ነጻ

Ming-omni-tts-0.5B by inclusionAI is a compact omni-modal speech model built on the BailingMM dense backbone with a Patch-by-Patch flow-matching audio decoder. Delivers 44.1kHz output (near CD quality), supports zero-shot voice cloning from a 3+ second reference, and includes built-in emotion / dialect / BGM control via JSON instructions. Excellent stability — 0.83% WER on Chinese benchmarks.

ደራሲ::
inclusionAI
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት: en, zh
ምርኣይ:: High-fidelity bilingual narration, emotion-controlled voice acting, Chinese audiobook content

MOSS-TTS NanoMOSS-TTS Nano

ነጻ

MOSS-TTS-Nano-100M is OpenMOSS's compact 100M-parameter variant of the MOSS-TTS family, sharing the delay-transformer architecture. Trades the 8B model's peak quality for ~80x smaller weights and dramatically lower per-request VRAM, making it suitable for free-tier and high-throughput deployments. Same 20-language reach.

ደራሲ::
OpenMOSS
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት: en, zh, de, es, fr, ja, it, ko, ru, ar, pt
ምርኣይ:: Free-tier TTS, high-volume production, low-latency interactive use

BarkBark

ቍጽሪ

Bark by Suno is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio like music, background noise, and sound effects. It can produce nonverbal communications like laughing, sighing, and crying. Bark supports over 100 speaker presets and 13+ languages.

ደራሲ::
Suno
ውልቀ-መዚ::
MIT
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
ድምጺ:
ኣይ
Sound effectsLaughing/sighingMusic generation100+ speakersMultilingual
ምርኣይ:: Creative audio content, audiobooks with emotion, sound effects

Bark SmallBark Small

ቍጽሪ

Bark Small is a distilled version of the Bark model that trades some audio quality for significantly faster inference speeds and lower memory requirements. It retains Bark's ability to generate speech with emotions, laughter, and multiple languages.

ደራሲ::
Suno
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, fr, de, hi, it, ja, ko, pl, pt, ru, es, tr
ድምጺ:
ኣይ
LightweightFaster than full BarkEmotional speechMultilingual
ምርኣይ:: Quick creative audio when full Bark is too slow

CosyVoice 2CosyVoice 2

ቍጽሪ

CosyVoice 2 by Alibaba's Tongyi Lab achieves human-comparable speech quality with extremely low latency, making it ideal for real-time applications. It uses a finite scalar quantization approach for streaming synthesis and supports zero-shot voice cloning, cross-lingual synthesis, and fine-grained emotion control. It outperforms many commercial TTS systems in subjective evaluations.

ደራሲ::
Alibaba (Tongyi Lab)
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, fr, de, it, es
ድምጺ:
ኣየ
StreamingZero-shot cloningCross-lingualEmotion controlHuman-parity
ምርኣይ:: Real-time applications, streaming TTS, voice assistants

Dia TTSDia TTS

ቍጽሪ

Dia by Nari Labs is a 1.6B parameter text-to-speech model designed specifically for generating multi-speaker dialogue. It can produce natural-sounding conversations between two speakers with appropriate turn-taking, prosody, and emotional expression. Dia is perfect for creating podcast-style content, audiobook dialogues, and interactive conversational AI.

ደራሲ::
Nari Labs
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣይ
Multi-speakerDialog generationNatural turn-takingEmotional expression1.6B parameters
ምርኣይ:: Podcasts, audiobook dialogues, conversational content

Parler TTSParler TTS

ቍጽሪ

Parler TTS is a text-to-speech model that uses natural language voice descriptions to control the generated speech. Instead of selecting from preset voices, you describe the voice you want (e.g., "a warm female voice with a slight British accent, speaking slowly and clearly") and Parler generates speech matching that description. This makes it uniquely flexible for creative applications.

ደራሲ::
Hugging Face
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣይ
Voice descriptionNatural language controlFlexible voice creationNo preset voices needed
ምርኣይ:: Creative applications where you need custom voice characteristics

Indic Parler TTSIndic Parler TTS

ቍጽሪ

Indic Parler TTS by AI4Bharat extends the Parler architecture to Indian languages, generating natural speech in Tamil, Bengali, Marathi, Gujarati, Kannada, Punjabi, Odia, Assamese, Hindi, Telugu, Malayalam and English. Like Parler, you describe the voice you want in plain language and the model matches it — no preset voices required. Trained on AI4Bharat speech corpora for authentic pronunciation and prosody across the Indian subcontinent.

ደራሲ::
AI4Bharat
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
ta, bn, mr, gu, kn, pa, or, as, hi, te, ml, en
ድምጺ:
ኣይ
11 Indian languagesVoice descriptionNatural language controlAuthentic Indic pronunciation
ምርኣይ:: Indian-language voiceovers, regional content, multilingual Indic applications

KhanomTan TTSKhanomTan TTS

ቍጽሪ

KhanomTan TTS is an open Thai text-to-speech model built on the YourTTS multilingual architecture. Trained on CC0 and permissively-licensed Thai corpora (TSync) alongside several other languages, it delivers natural Thai speech with multiple speaker voices. A clean, commercially-usable option for Thai — the language most open TTS models only cover under non-commercial licenses.

ደራሲ::
Wannaphong Phatthiyaphaibun
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
th
ድምጺ:
ኣይ
Thai TTSMultiple speakersYourTTS architectureCommercial-safe license
ምርኣይ:: Thai voiceovers, Thai-language content and apps

IndexTTS-2IndexTTS-2

ቍጽሪ

IndexTTS-2 is an advanced text-to-speech system that excels at zero-shot voice synthesis with fine-grained emotion control. It can generate speech with specific emotional tones like happy, sad, angry, or fearful without requiring emotion-specific training data. The model uses emotion vectors to precisely control the emotional expression of generated speech.

ደራሲ::
Index Team
ውልቀ-መዚ::
Bilibili Model License
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh
ድምጺ:
ኣየ
Emotion controlZero-shotEmotion vectorsExpressive speechFine-grained control
ምርኣይ:: Emotionally expressive content, audiobooks, virtual assistants

Spark TTSSpark TTS

ቍጽሪ

Spark TTS by SparkAudio is a text-to-speech model that combines voice cloning with controllable emotion and speaking style. Using just 5 seconds of reference audio, it can clone a voice and then generate speech with different emotions, speeds, and styles while maintaining the cloned voice identity. Spark TTS uses a prompt-based control system.

ደራሲ::
SparkAudio
ውልቀ-መዚ::
CC BY-NC-SA 4.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh
ድምጺ:
ኣየ
Voice cloningEmotion controlStyle controlPrompt-based5-second cloning
ምርኣይ:: Content creation with cloned voices and emotional control

GPT-SoVITSGPT-SoVITS

ቍጽሪ

GPT-SoVITS combines GPT-style language modeling with SoVITS (Singing Voice Inference via Translation and Synthesis) for powerful few-shot voice cloning. With as little as 5 seconds of reference audio, it can accurately clone a voice and generate new speech while preserving the speaker's unique characteristics. It excels at both speaking and singing voice synthesis.

ደራሲ::
RVC-Boss
ውልቀ-መዚ::
MIT
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko
ድምጺ:
ኣየ
5-second cloningSinging voiceFew-shot learningHigh fidelityCross-lingual
ምርኣይ:: Voice cloning, singing synthesis, content creator voice replication

OrpheusOrpheus

ቍጽሪ

Orpheus is a large-scale text-to-speech model that achieves human-level emotional expression. Trained on over 100,000 hours of diverse speech data, it excels at generating speech with natural emotions, emphasis, and speaking styles. Orpheus can produce speech that is virtually indistinguishable from human recordings.

ደራሲ::
Canopy Labs
ውልቀ-መዚ::
Llama 3.2 Community
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣይ
Human-level emotion100K hours trainingNatural emphasisExpressive speech
ምርኣይ:: High-quality emotional speech, audiobooks, voice acting

Qwen3 TTSQwen3 TTS

ቍጽሪ

Qwen3-TTS is a 1.7 billion parameter text-to-speech model from Alibaba's Qwen team. It supports two modes: preset voices with emotion control (9 speakers), and a unique voice design mode where you describe the voice you want in natural language. It covers 10 languages with high expressiveness and natural prosody.

ደራሲ::
Alibaba (Qwen)
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, de, fr, ru, pt, es, it
ድምጺ:
ኣይ
9 preset voicesVoice design from textEmotion control10 languages
ምርኣይ:: Multilingual content with preset voices or custom voice design

VieNeu-TTS-v2VieNeu-TTS-v2

ቍጽሪ

VieNeu-TTS-v2 is a 300M parameter Vietnamese-first TTS model trained on 10,000+ hours of bilingual data. It supports seamless en-vi code-switching, 7 preset voices spanning Northern and Southern accents, and instant voice cloning from 3-5 seconds of reference audio. Runs entirely on CPU via GGUF Q4 inference + ONNX audio decoder — no GPU needed, generations finish in ~7 seconds. Built on a Qwen3 backbone.

ደራሲ::
Phạm Nguyễn Ngọc Bảo
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
vi, en
ድምጺ:
ኣየ
7 preset voices (North + South accents)En-Vi code-switchingVoice cloning (3-5s reference)Podcast / multi-speaker supportCPU-only — no GPU required
ምርኣይ:: Vietnamese content and bilingual en-vi narration

Chatterbox TurboChatterbox Turbo

ቍጽሪ

Chatterbox Turbo by Resemble AI is a 350M parameter upgrade to Chatterbox, delivering up to 6x real-time speed with sub-200ms latency. It supports paralinguistic tags like [laugh], [cough], and [chuckle] directly in text. Includes Perth watermarking on all generated audio for provenance tracking.

ደራሲ::
Resemble AI
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣየ
Sub-200ms latencyParalinguistic tags6x real-timeVoice cloningWatermarking
ምርኣይ:: Real-time voice agents, expressive speech with natural sounds

VoxCPMVoxCPM

ቍጽሪ

VoxCPM 1.5 by OpenBMB is a novel tokenizer-free TTS model that operates in continuous space rather than discrete tokens. It produces high-fidelity 44.1kHz audio, supports zero-shot voice cloning from 3-10 seconds, and maintains consistency across paragraphs. Cross-language cloning lets you apply an English voice to Chinese speech and vice versa.

ደራሲ::
OpenBMB
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, zh
ድምጺ:
ኣየ
44.1kHz audioTokenizer-freeCross-lingual cloningContext-awareLoRA fine-tuning
ምርኣይ:: High-fidelity audio, audiobooks, long-form content with voice consistency

VibeVoiceVibeVoice

ቍጽሪ

VibeVoice from Microsoft generates long-form speech up to 90 minutes with support for 4 simultaneous speakers, making it ideal for podcasts and dialogues. The Realtime 0.5B variant achieves ~300ms latency for interactive use. Supports speaker tags for multi-turn dialogue generation.

ደራሲ::
Microsoft
ውልቀ-መዚ::
MIT
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, zh
ድምጺ:
ኣይ
Multi-speakerLong-form (90 min)Podcast generationDialogueLow latency
ምርኣይ:: Podcasts, dialogues, long-form narration, multi-speaker content

CosyVoice3CosyVoice3

ቍጽሪ

CosyVoice3 is the latest evolution from Alibaba's FunAudioLLM team. It features bi-streaming inference with ~150ms latency, instruction-based control for emotion/speed/volume, and improved speaker similarity for zero-shot cloning. Supports 9 languages plus 18 Chinese dialects. RL-tuned variant delivers state-of-the-art prosody.

ደራሲ::
Alibaba (FunAudioLLM)
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Fast
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, de, es, fr, it, ru
ድምጺ:
ኣየ
Bi-streamingEmotion controlVoice cloningSpeed/volume controlInstruction following
ምርኣይ:: Multilingual production TTS, real-time applications, voice cloning

NAMAA Saudi TTSNAMAA Saudi TTS

ቍጽሪ

NAMAA Saudi TTS is a Saudi Arabic fine-tune of Resemble AI's ChatterboxMultilingual. Trained by NAMAA Space on authentic Saudi-dialect speech, it produces natural Modern Standard Arabic and Saudi colloquial pronunciation that generic multilingual models cannot match. Inherits Chatterbox's zero-shot voice cloning and emotion control via reference audio prompts. The first open-weights Arabic TTS deployed on TTS.ai.

ደራሲ::
NAMAA Space
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
ar
ድምጺ:
ኣየ
Saudi Arabic dialectModern Standard ArabicZero-shot voice cloningEmotion controlNative pronunciation
ምርኣይ:: Arabic content for Saudi audiences, MSA narration, Khaleeji-dialect voice agents, Arabic audiobooks

Darwin TTSDarwin TTS

ቍጽሪ

Darwin-TTS-1.7B-Cross by FINAL-Bench is a research variant of Qwen3-TTS-1.7B where 84 talker-FFN tensors (8.6%) are blended at α=3% with the matching tensors from Qwen3-1.7B-Base. The blend is built without retraining and produces noticeably crisper cross-lingual voice cloning across Korean, English, Japanese, and Chinese. Operates in zero-shot voice-clone mode (3 seconds reference audio).

ደራሲ::
FINAL-Bench
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, ko, ja, zh
ድምጺ:
ኣየ
Voice cloningCross-lingualFFN-blended4 core languagesQwen3 backbone
ምርኣይ:: Cross-lingual voice cloning between English / Korean / Japanese / Chinese with a single reference voice

MOSS-TTSDMOSS-TTSD

ቍጽሪ

MOSS-TTSD v1.0 from OpenMOSS is a 7B dialogue text-to-speech model that continues conversations from a short audio prompt. Supports up to 5 simultaneous speakers via [S1]/[S2] tags, zero-shot voice cloning from 3-10s reference audio, and up to 60 minutes of coherent multi-turn dialogue across 20 languages. Distinct from MOSS-TTS — TTSD is specialized for podcast/audiobook/dubbing workflows.

ደራሲ::
OpenMOSS
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh
ድምጺ:
ኣየ
Multi-speaker dialogueUp to 5 speakers60min coherent audioVoice cloningPodcast-optimised
ምርኣይ:: Podcasts, audiobooks, dubbed dialogue, conversational content with multiple voices

ChatterboxChatterbox

ተለቪዥን

Chatterbox by Resemble AI is a cutting-edge zero-shot voice cloning model. It can replicate any voice from a single audio sample with remarkable accuracy, capturing not just the timbre but also the speaking style and emotional nuances. Chatterbox also features fine-grained emotion control, allowing you to adjust the emotional tone of the generated speech independently from the voice identity.

ደራሲ::
Resemble AI
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣየ
VRAM:
4GB
ዋጋ 1K ኣሃዝ:
4x
Zero-shot cloningEmotion controlHigh fidelityStyle transferSingle sample cloning
ምርኣይ:: Professional voice cloning with emotional control, content creation

Tortoise TTSTortoise TTS

ተለቪዥን

Tortoise TTS is an autoregressive multi-voice text-to-speech system that prioritizes audio quality over speed. It uses DALL-E-inspired architecture to generate highly natural speech with excellent prosody and speaker similarity. While slower than many alternatives, Tortoise produces some of the most realistic synthetic speech available in the open-source ecosystem.

ደራሲ::
James Betker
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣየ
VRAM:
8GB
ዋጋ 1K ኣሃዝ:
4x
Highest qualityMulti-voiceDALL-E architectureVoice cloningAutoregressive
ምርኣይ:: Audiobooks, premium content, quality-first applications

StyleTTS 2StyleTTS 2

ተለቪዥን

StyleTTS 2 achieves human-level TTS synthesis by combining style diffusion with adversarial training using large speech language models. It generates the most natural sounding speech among single-speaker models, rivaling human recordings. StyleTTS 2 uses diffusion-based style modeling to capture the full range of human speech variation.

ደራሲ::
Columbia University
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣይ
VRAM:
4GB
ዋጋ 1K ኣሃዝ:
4x
Human-levelStyle diffusionAdversarial trainingNatural variationHigh fidelity
ምርኣይ:: Studio-quality single-speaker synthesis, professional narration

OpenVoiceOpenVoice

ተለቪዥን

OpenVoice by MyShell.ai enables instant voice cloning with granular control over voice style, emotion, accent, rhythm, pauses, and intonation. It can clone a voice from a short audio clip and generate speech in multiple languages while maintaining the speaker identity. OpenVoice also functions as a voice converter, allowing real-time voice transformation.

ደራሲ::
MyShell.ai / MIT
ውልቀ-መዚ::
MIT
ፍጥነት:
Medium
ጥራሕ::
ቋንቋታት:
en, zh, ja, ko, fr, es
ድምጺ:
ኣየ
VRAM:
4GB
ዋጋ 1K ኣሃዝ:
4x
Instant cloningVoice conversionEmotion controlAccent controlMultilingual
ምርኣይ:: Voice cloning with fine-grained style control, voice conversion

Sesame CSMSesame CSM

ተለቪዥን

Sesame CSM (Conversational Speech Model) is a 1 billion parameter model designed specifically for generating conversational speech. It models the natural patterns of human conversation including turn-taking timing, backchannel responses, emotional reactions, and conversational flow. CSM generates audio that sounds like a natural human conversation rather than synthetic speech.

ደራሲ::
Sesame
ውልቀ-መዚ::
Apache 2.0
ፍጥነት:
Slow
ጥራሕ::
ቋንቋታት:
en
ድምጺ:
ኣይ
VRAM:
8GB
ዋጋ 1K ኣሃዝ:
4x
ConversationalNatural timingTurn-takingBackchannel1B parameters
ምርኣይ:: AI assistants, chatbots, conversational AI applications

ቍጽሪ መሳርሒታት

ምሳሌ ደራሲ: ቍጽሪ ጥራሕ: ፍጥነት ቋንቋታት ድምጺ VRAM ውልቀ-መዚ: ዋጋ
Kokoro Hexgrad Free Fast 8 1.5GB Apache 2.0 ነጻ መተግበሪያ
Piper Rhasspy Free Fast 42 0 (CPU only) MIT ነጻ መተግበሪያ
VITS Jaehyeon Kim et al. Free Fast 11 1GB MIT ነጻ መተግበሪያ
MeloTTS MyShell.ai Free Fast 6 0.5GB (GPU optional) MIT ነጻ መተግበሪያ
Bark Suno Standard Slow 13 5GB MIT 2 መተግበሪያ
Bark Small Suno Standard Medium 13 2GB MIT 2 መተግበሪያ
CosyVoice 2 Alibaba (Tongyi Lab) Standard Medium 8 4GB Apache 2.0 2 መተግበሪያ
Dia TTS Nari Labs Standard Medium 1 4GB Apache 2.0 2 መተግበሪያ
Parler TTS Hugging Face Standard Medium 1 4GB Apache 2.0 2 መተግበሪያ
Indic Parler TTS AI4Bharat Standard Slow 12 8GB Apache 2.0 2 መተግበሪያ
KhanomTan TTS Wannaphong Phatthiyaphaibun Standard Fast 1 2GB Apache 2.0 2 መተግበሪያ
IndexTTS-2 Index Team Standard Medium 2 4GB Bilibili Model License 2 መተግበሪያ
Spark TTS SparkAudio Standard Medium 2 4GB CC BY-NC-SA 4.0 2 መተግበሪያ
GPT-SoVITS RVC-Boss Standard Slow 4 6GB MIT 2 መተግበሪያ
Orpheus Canopy Labs Standard Medium 1 4GB Llama 3.2 Community 2 መተግበሪያ
Chatterbox Resemble AI Premium Medium 1 4GB MIT 4 መተግበሪያ
Tortoise TTS James Betker Premium Slow 1 8GB Apache 2.0 4 መተግበሪያ
StyleTTS 2 Columbia University Premium Medium 1 4GB MIT 4 መተግበሪያ
OpenVoice MyShell.ai / MIT Premium Medium 6 4GB MIT 4 መተግበሪያ
Qwen3 TTS Alibaba (Qwen) Standard Medium 10 7GB Apache 2.0 2 መተግበሪያ
VieNeu-TTS-v2 Phạm Nguyễn Ngọc Bảo Standard Fast 2 CPU Apache 2.0 2 መተግበሪያ
Sesame CSM Sesame Premium Slow 1 8GB Apache 2.0 4 መተግበሪያ
Chatterbox Turbo Resemble AI Standard Fast 1 2GB MIT 2 መተግበሪያ
VoxCPM OpenBMB Standard Fast 2 4GB Apache 2.0 2 መተግበሪያ
Kani TTS 2 NineNineSix Free Fast 1 3GB Apache 2.0 ነጻ መተግበሪያ
OuteTTS OuteAI Free Slow 1 2GB Apache 2.0 ነጻ መተግበሪያ
VibeVoice Microsoft Standard Fast 2 4GB MIT 2 መተግበሪያ
Pocket TTS Kyutai Free Fast 2 1GB MIT ነጻ መተግበሪያ
Kitten TTS KittenML Free Fast 1 0GB Apache 2.0 ነጻ መተግበሪያ
CosyVoice3 Alibaba (FunAudioLLM) Standard Fast 9 4GB Apache 2.0 2 መተግበሪያ
NAMAA Saudi TTS NAMAA Space Standard Medium 1 6GB MIT 2 መተግበሪያ
Darwin TTS FINAL-Bench Standard Medium 4 7GB Apache 2.0 2 መተግበሪያ
MOSS-TTSD OpenMOSS Standard Medium 2 12GB Apache 2.0 2 መተግበሪያ
Ming-Omni TTS inclusionAI Free Medium 2 3GB Apache 2.0 ነጻ መተግበሪያ
MOSS-TTS Nano OpenMOSS Free Fast 11 2GB Apache 2.0 ነጻ መተግበሪያ

እቲ ቐዳማይ ቴክስት-ወደ-ቃል-ፕላትፎርም

TTS.ai እንታይ እዩ ዝገብር?

TTS.ai ኣብ ሓደ ሓደ ቐላል ዝኸውን መረብ መተግበሪ፡ ኣብ ዓለም ዝዓበዩ ናይ መሰረታዊ ጽሑፍ-እቲ-ብቓል-ብቓል ሞዴላትን TTS.ai ኣብ ሓደ ሓደ ቐላል ዝኸውን መረብ መተግበሪ፡ ኣብ ሓደ ሓደ ድምጺ-ኢንጂን ዝዕቅብ ናይ ባዕሉ መሳርሒታትን TTS.ai ኣብ 20+ ሞዴላትን ካብቶም ቀዳሞት ናይ ምርምር ላቦራቶሪታት TTS.ai ይቕበልካ፡ ኣብ ላቦራቶሪታት ኰኪ፣ ማይሰል፣ ኣምፊዮን፣ NVIDIA፣ ሱኖ፣ ሑጊንግፌስ፣ ዩኒቨርሲቲ ቺንግሁኣ፣ ወዘተ.

ኩሉ ሞዴል ኣብ MIT, Apache 2.0, ወይ ኣብ ካልእ ተመሳሳሊ ውልቀ-መዚ ዝሕለፍ ኮይኑ፡ ንተጠቃሚ ሙሉእ መሰል ንግዲ ንምርካብ ኣብ ፕሮጀክትታትካ ዝውሃብ ድምጺ ይጥቀመሉ እዩ። TTS.ai ንኹሉ ዓይነት ተግባር ዝውሃብ ሞዴል የብሉን።

ነጻ ሞዴላት, No Account ዝግባእ

ብቐጥታ ምስ ሰለስተ ነጻ TTS ሞዴላት መጀመርያ: Piper (ultra-fast, lightweight), VITS (high-quality neural synthesis), and MeloTTS (multi-language support). No sign-up, no credit card, no limits on generations. ነጻ ሞዴላት እንግሊዝኛን ካልእን ቋንቋታትን ብናይ ስነ-ጥበባዊ-ድምጺ ውጽኢት ዝስዕብ ንብዙሓት ኣፕሊኬሽናት ይደግፉ

GPU-ኣዝዩ ዝደፍአ ምርምር

ኵሎም ሞዴላት TTS ኣብ ዝተሓላለፉ NVIDIA GPUs ይሰርሑ፡ ንምፍጣር ግዜታት በብግዜኡ ዝቕጽል፡ ምትእምማን ዘለዎም. ነጻ ሞዴላት፡ ኣብ ዝያዳ 2 ሰከንዶም ድምጺ ይፍጠሩ። ስታንደርድ ሞዴላት፡ ከም Kokoro፣ CosyVoice 2፣ን Bark፡ ኣብ 3-5 ሰከንዶም ድምጺ ይፍጠሩ። ፕሪሚየም ሞዴላት፡ ከም Tortoise፣ Chatterbox፣ ኣብ 5-15 ሰከንዶም ድምጺ ይፍጠሩ፡ ብጽሑፍ ዝርከብ ርዝመት መሰረት ዝገበረ።

30+ ቋንቋታት ተተኪኦም

ድምጺ ኣብ ልዕሊ 30 ቋንቋታት ምጽሓፍ፡ ኣብኣቶም ኣንግሊዝኛ፣ ስፓኒሽ፣ ፈረንሳዊ፣ ጀርመንኛ፣ ኢጣልያንኛ፣ ፖርቱጋሊኛ፣ ቻይንኛ፣ ጃፓንኛ፣ ኮርያንኛ፣ ዓረብኛ፣ ሃንዲሽ፣ ሩስያ፣ንብዙሕ ካልእን። ዓበይቲ ሞዴላት ድምጺ ኣብ ልዕሊ 30 ቋንቋታት ምጽሓፍ ይደግፉ፣ ማለት፡ ድምጺ ኣብ ልዕሊ 30 ቋንቋታት ምጽሓፍ ትኽእል ኢኻ።

መተግበሪያታት

TTS.ai ኣብ መተግበሪያታትካ ምስ REST API ናጻ ምትእስሳር TTS.ai. ሓደ መጨረሻ ን 20+ ሞዴላት. Python, JavaScript, cURL, Go SDKs. ደገፍ ስትሪምינג ን ረጅም ጊዜ መተግበሪያታት. ዑቕባ ምትእስሳር ን ውጽኢት ናጻ ምትእስሳር. Webhooks for async notifications.

ሕቶታት ዝቐረቡሉ ግዜ

ጽሑፍ ናብ ቃላት (TTS)፡ ቴክኖሎጂ AI እዩ፣ ጽሑፍ ዝተጻሕፈ ናብ ድምጺ ዝሰምዖ ድምጺ ዝቕይር። ሞዴላት TTS ናይ ሎሚ፡ ከም ኮኮሮ፣ ቻተርቦክስ፣ን ኮሲቮይስ 2፡ ፍልጠት ዓቢ ይጥቀሙ፡ ድምጺ ብዓይኒ ሰብ ዝሰምዖ፣ ብባህርያዊ ድምጺ፣ ስነ-ልቦና፣ን ሪትምን ንምፍጣር።

ኣብ ፍላጻኻ ይመርጽ። ንጥፈታት ሒዝካ ክትርኢ እንተደሊኻ፡ Piper ወይ MeloTTS (ብነጻ፣ ፈጣን) ተጠቒምካ። ንጥፈታት ሒዝካ ክትርኢ እንተደሊኻ፡ Kokoro ወይ CosyVoice 2 (ስታንደርድ) ተጠቒምካ። ንጥፈታት ድምጺ ክትርኢ እንተደሊኻ፡ Chatterbox ወይ GPT-SoVITS (ፕሪሚየም) ተጠቒምካ። ንጥፈታት ቃለ-መሕትት/ፖድካስት ተጠቒምካ፡ Dia TTS። ኵሉ ሞዴል ናቱ ደገፍቲ ዘለዎም — ንጥፈታት ንምፍጣር ምርምር ግበር።

TTS.ai ብቐሊሉ ቓል-ኣብ-ቓል ምስ ሞዴላት Kokoro, Piper, VITS, and MeloTTS ይቕረ ይብል። ምዝገባ ኣይተግባረን፡ ክሳብ 500 ኣርእስቲን 3 ወለዶታትን ኣብ ሰሙን ይቕረ ይብል። ምዝገባ ብቐሊሉ ምዝገባ ንምርካብ 15,000 ኣርእስቲን ንዅሎም ሞዴላት ንምርካብን ምዝገባ ይግባእ።

30+ ቋንቋታት ኣብ 30+ ቋንቋታት 30+ ቋንቋታት ኣብ 30+ ቋንቋታት 30+ ቋንቋታት ኣብ 30+ ቋንቋታት 30+ ቋንቋታት ኣብ 30+ ቋንቋታት 30+ ቋንቋታት ኣብ 30+ ቋንቋታት 30+ ቋንቋታት ኣብ 30+ ቋንቋታት 30+ ቋንቋታት ኣብ 30+ ቋንቋታት

ኣየናይ ድምጺ TTS.ai እዩ ዝህብ? ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ TTS.ai ንኸይጥቀም ይግባእ። ድምጺ 8800

TTS.ai ፎርማታት ምርካብ MP3, WAV, OGG, and FLAC ይደግም። MP3 እዩ እቲ ዝተቐመጠ ፎርማት ንምውጻእ። WAV እዩ ዝሕተት። ኣብ መንጎ ፎርማታት ክትቀየር ትኽእል ኢኻ ብናይ ጆሮ መሳሪያና ጆሮ ምትኻኽኽ

ድምጺ ክሎን ምግባር AI ንጥቀመሉ ድምጺ ካብ ሓደ ድምጺ ዝቐርባ (ኣብ 5-30 ሰከንዶም) ንምድላዉ ይጥቀመሉ እዩ። ድምጺ ዝቐርባ መዝገብ ኣትዉ፣ ከምኡ'ውን ሞዴላት ከም Chatterbox, GPT-SoVITS, ወይ OpenVoice ኣብኡ ድምጺ ምጽሓፍ ይጅምሩ። ድምጺ ምስ ዝጸንሐ ጥራሕነቱ ይሻሻል'ዩ።

ነጻ ተጠቃሚታት ክሳብ 500 ኣርእስቲ ኣብ ሓደ ሕቶ ክፈጥሩ ይኽእሉ እዮም። ተመዝጊቦም ዘለዉ ተጠቃሚታት ክሳብ 5,000 ኣርእስቲ ኣብ ሓደ ሕቶ ክረኽቡ ይኽእሉ እዮም። ኣብ ትርጉም ረኺቦም ዘለዉ ጽሑፋት፡ እቲ ድምጺ ኣብ ዑቕባታት ይፍጠር፡ ብተደጋጋሚ ድማ ይቕረቡ።

SSML (Speech Synthesis Markup Language) ደገፍ ብሞዴል ይቕይር። Piperን ካልእ ሞዴላትን መሰረታዊ SSML tags ንጥፈታት፣ ኣተኩርን፣ን ድምጺ ንምውሳድን ይደግፉ። ኣብ ሞዴላት ዘይምሕባር SSML ደገፍ፣ ስነ-ጥበብን መስመር-ምፍታሕን ክትጥቀመሉ ትኽእል ኢኻ።

ኣየናይ ዓይኒ ድምጺ እዩ ዝጥቀመሉ?

ኣየናይ መልክዕ እዩ?

ካብ ዳይስቦርድ ኣድራሻኻ፡ ኣዝዩ ኣገዳሲ ዝኾነ ምትእስሳር API ኣትሒዝካ፡ ድሕሪኡ፡ መተግበሪያ POST ናብ REST API endpointና ምስ ጽሑፍካ፣ ሞዴልካ፣ንድምጺካ ሒዝካ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላእኽቲ መላ
5.0/5 (4)

እንታይ ክንገብር ንኽእል? ምላሽካ ንዘሎና ሕቶታት ንምፍታሕ ይሕግዘና እዩ።

መጀመርታ ጽሑፍ ናብ ቃላት ምቕያር

TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai ንጥፈታት TTS.ai