Open Source Text to Speech Models

MIT, Apache 2.0 - enweghị ikike ikike, enweghị nkwụsị ọrụ, enweghị ụgwọ nkwekọrịta nkwekọrịta. Jiri ha site na API anyị, ma ọ bụ na-eziga ha na ntọala gị na nchịkwa zuru oke.

Ónyénwē ônyénwē MIT License Apache Òtù GitHub

Jiri ya ugbua

Free na Kokoro, Piper, VITS, MeloTTS
Ọdịdị gị ga-egosipụta ebe a
E mepụtara
Bubata
Ị hụrụ TTS.ai? Kpọtụrụ enyi gị!

Open Source TTS Benefits

Gịnị mere open-source models ji dị mkpa maka ọrụ gị

All Open-Source Licensed

Modelsdị ọ bụla na TTS.ai na-eji ikike ikike ikike ikike. Ọ dịghị ihe ọ bụla, ọ dịghị onye na-ere ahịa, ọ dịghị ụgwọ nkwekọrịta nkwekọrịta.

MIT / Apache 2.0

Models bụ ndị a na-enye ikike n'okpuru MIT mọọbụ Apache 2.0, ndị kasị na-enye ikike maka isi mmalite mepere emepe. Jiri ya n'ụzọ azụmahịa, megharịa ya, weghachi ya - enweghị nkwụsị.

Òtù

Wepụ ụdị ọbụla ma rụọ ya na haadịrọdị gị. Nlekọta zuru ezu n'elu data gị, latency, na infrastuktọ. Enweghị mkpado cloud.

GPU nke emelitere

Models bụ ndị a rụpụtara maka NVIDIA GPUs na CUDA nkwado. Piper na-arụ ọrụ na CPU naanị. Models ndị kasị ukwuu chọrọ 2-8GB VRAM maka nghọta dị mma.

Nhazi

Ndị ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ na-arụ ọrụ.

Ọrụ azụmahịa OK

Models niile na-enye ohere iji ọrụ azụmahịa n'okpuru ikike ha. Bipụta ngwaahịa, zụta ọrụ, na mepụta ọdịnaya azụmahịa na-enweghị royalties ma ọ bụ ụgwọ ọrụ.

Open Source Model Catalog

Model ọbụla, ikike ya, na ihe ọ na-eme ka ọ bụrụ ihe kacha mma

KokoroKokoro

Free

Lightweight 82M parameter model delivering studio-quality speech with blazing-fast inference.

Fast 5/5

Ọkachasị maka: Apache 2.0 - kacha mma nkwalite free model, 82M params, mfe ka self-host

Nwapụta Kokoro

PiperPiper

Free

A fast, local neural text to speech system optimized for Raspberry Pi and embedded devices.

Fast 3/5

Ọkachasị maka: MIT - CPU-ọbụla, zuru ezu maka ngwaọrụ edge na embedded self-hosting

Nwapụta Piper

VITSVITS

Free

Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech.

Fast 3/5

Ọkachasị maka: MIT — ntọala architecture ejirila site n'ọtụtụ downstream models

Nwapụta VITS

BarkBark

Standard

Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.

Slow 4/5

Ọkachasị maka: MIT — ikikembanye ụda emeredịkachọrọ n'okpuru ụkpụrụ TTS

Nwapụta Bark

Tortoise TTSTortoise TTS

Premium

Multi-voice text-to-speech focused on quality with autoregressive architecture.

Slow 5/5 Klọnsị ụda

Ọkachasị maka: Apache 2.0 - ogo kacha nta, n'ụzọ zuru ezu chọpụtara ihenlereanya mmepe

Nwapụta Tortoise TTS

OpenVoiceOpenVoice

Premium

Instant voice cloning with granular control over style, emotion, and accent.

Medium 4/5 Klọnsị ụda

Ọkachasị maka: MIT - okporo-ọnụ na-ebuli elu na nlekọta ụda na-akpụchaghị akpụchaghị

Nwapụta OpenVoice

Otu esi eji Open Source TTS

Jiri anyị hosted API mọọbụ rụọ usoroiheomume gị onwe gị

1

Nwalee Open-Source Models

Browse anyị catalog nke 20+ open-source TTS models. Model page ọ bụla na-egosi ikike, architecture, ikike, na self-hosting chọrọ.

2

Jiri nchọgharị gị

Tụlee ụdị ọ bụla n'ụzọ ziri ezi na TTS.ai na-enweghị ịwụnye ihe ọ bụla. Ndị ọrụ GPU anyị na-ejikwa usoro ihe omume ka ị nwee ike ịtụle ogo tupu ị banye na self-hosting.

3

Self-Host ma ọ bụ jiri API anyị

Clone model repos site na GitHub ma rụọ ọrụ na mpaghara, ma ọ bụ jiri anyị hosted API maka mmepụta. Self-hosting na-enye nlekọta zuru oke; API anyị na-enye ntọala njikwa.

4

Kewapụta usoroiheomume gị

TTS na-ejikọta na ngwaahịa gị site na iji ụdị onwe-echekwa ma ọ bụ REST API anyị. Ngwaọrụ niile nwere ike iji ya na-enweghị ụgwọ ikike ma ọ bụ royalties.

Ndekọ ikikembanye aha

Models niile na TTS.ai na-eji nnweta-n'aka-ọnụ-ọnụ-ọnụ-ọnụ-ọnụ

Móòdù Ikikere Ọrụ ọhaneze Ndezigharị Ụlọọrụ onwe ya Nkọwapụta
Kokoro Apache 2.0 Ekwesịrị
Piper MIT Nhọrọ
VITS MIT Nhọrọ
MeloTTS MIT Nhọrọ
Chatterbox MIT Nhọrọ
Tortoise TTS Apache 2.0 Ekwesịrị
StyleTTS 2 MIT Nhọrọ
OpenVoice MIT Nhọrọ
Sesame CSM Apache 2.0 Ekwesịrị
Orpheus Llama 3.2 "Built with Llama"

Self-Hosting vs Hosted API

Bido móòdù gị ma ọ bụ hapụ anyị ka anyị rụzigharịa inlọọrụ ahụ

Self-Host na haịdrọịd gị

Model ọ bụla na TTS.ai dị ka ọbịbịa ọbịbịa na GitHub ma ọ bụ Hugging Face. Download the weights, install the dependencies, and run inference on your own GPUs. You have full control over latency, privacy, and scaling.

  • Nchekwa data zuru ezu - ụda agaghị ahapụ sava gị
  • Enweghị ọnụọgụgụ n'ihi arịrịọ mgbe ntọala mbụ
  • Nhazi nkeonwe na data gị
  • GPU ọfụụ (NVIDIA na-atụ aro)
  • I nwere ike ijikwa mmelite, mmegharị, na ndozi

Jiri TTS.ai Hosted API

Nweta ohere n'oge na-adịghị anya na ụdị 20+ niile site na API REST dị iche iche. Anyị na-elekọta GPU provisioning, model updates, queue management, na scaling. API key na-enye gị ohere ịnweta ụdị ọ bụla - enweghị mkpa ịchịkwa nrụpụta dị iche iche.

  • Enweghị GPU haịdrọịd chọrọ
  • 20+ niile model site na otu API
  • Nhazigharị na mmelite nkeonwe nke móòdù
  • 99.9% uptime na redundant infrastructure
  • Zụlite naanị maka ihe ị na-eji

Nhazi

Jiri API anyị nọnyeere, mọọbụ tinye Kokoro na mpaghara ebe ahụ n'ime nkeji

Nhọrọ 1: TTS.ai Hosted API Nnọọ
import requests

response = requests.post("https://api.tts.ai/v1/tts", json={
    "text": "Open source TTS with a simple API.",
    "model": "kokoro",
    "voice": "af_heart",
    "format": "wav"
}, headers={"Authorization": "Bearer YOUR_API_KEY"})

with open("output.wav", "wb") as f:
    f.write(response.content)
Nhọrọ 2: Self-Host na pip Nlekọta zuru ezu
# Install Kokoro locally
pip install kokoro

# Generate speech on your own GPU
import kokoro

pipeline = kokoro.KPipeline(lang_code="a")
generator = pipeline("Hello from your own server!", voice="af_heart")
for i, (gs, ps, audio) in enumerate(generator):
    kokoro.save(audio, f"output_{i}.wav")

Open Source, Atụmatụ Atụmatụ

Anyị na-echekwa API na-eme ka okporo ụzọ TTS dị mfe iji na-enweghị ịchịkwa GPUs.

Nhazi

$0

15,000 characters on signup

  • 4 open-source models free
  • Enweghị ndebanye maka ojiji okpuru
  • Ekwesịrị iji ya n'ọrụ azụmahịa

Nhazi

$9

500,000 characters/month

  • 20+ niile ohuru-source models
  • Nhazi ụda
  • Nbanye API

Pro

$29

2,000,000 characters/month

  • Nhazi GPU nke n'ihu
  • Models niile
  • Nnyemaka Enterprise
Gosi ọnụahịa zuru ezu

Ajụjụ ndị a na-ajụkarị

Ajụjụ ndị a na-ajụkarị banyere ngwe oghe-esonụ ka okwu

Ee. Model ọbụla na TTS.ai na-eji ikikembanye nke na-emeghe - MIT ma ọ bụ Apache 2.0. Anyị na-ewepụ ụdị ndị ahụ na ikikembanye na-egbochi (dị ka Coqui CPML ma ọ bụ CC-BY-NC na-enweghị n'aka). I nwere ike ịhụ ikikembanye nke model ọbụla na GitHub repository ya.

Ha abụọ bụ ndị na-enye ikike maka isi mmalite mepere emepe na-enye ohere iji ọrụ azụmahịa, mgbanwe, nakwa ntụgharị. Apache 2.0 na-egbakwunye ikike patent na-egosi na-achọ ịkọwa mgbanwe ma ọ bụrụ na ịgbanwee koodị. MIT dị mfe na ihe ndị dị mkpa. Ha abụọ bụ ndị na-enye aka n'ọrụ.

Ee. Model ọbụla nwere ike ịbụ self-hosted. Clone model repository site na GitHub, wụnye dependencies, budata model weights, na rụọ inference. Anyị na-enye ntinye akwụkwọ maka ihe ndị chọrọ nke model ọbụla nke self-hosting gụnyere GPU, RAM, na Python version.

Achọrọ dị iche iche site na ụdị. Piper chọrọ GPU ọ bụla (CPU naanị). Kokoro na MeloTTS chọrọ 1-2GB VRAM. Ụfọdụ ụdị ụkpụrụ chọrọ 4GB VRAM. Tortoise na Sesame CSM chọrọ 8GB. A NVIDIA RTX 3060 (12GB) nwere ike ịgagharị ọtụtụ ụdị n'ụzọ dị mfe.

Ee. Open-source licenses na-ahapụ mmegharị na-agụnye nnweta-nweta. Models dị ka GPT-SoVITS na Bark na-enye nnweta-nweta isiokwu. I nwere ike ịkụziri model na data ụda gị ka ịmepụta ụda emeredịkachọrọ mọọbụ melite ọrụ maka asụsụ emeredịkachọrọ.

Top open-source models (Kokoro, StyleTTS 2, Chatterbox) ugbu a dị ka ma ọ bụ karịa ọrụ azụmahịa dị ka ElevenLabs na Google TTS na benchmarks dị mma. Atụmatụ kachasị nke ọrụ azụmahịa bụ mmepe na nkwado, ọ bụghị ụda.

Anyị emeela ka ha pụta. XTTS/XTTS-v2 (Coqui's CPML - non-commercial), F5-TTS (CC-BY-NC - non-commercial), na Higgs-v2 (Boson License - restrictive) niile a na-ewepụ. Model ọbụla na TTS.ai a na-enyocha ya ka ọ bụrụ nke a ga-eji n'ọrụ n'ụzọ nkịtị.

Ee. Models ndị kasị ukwuu na-anabata mmepe nke obodo site na GitHub. I nwere ike ịnye ndepụta nsogbu, ndekọ olu maka asụsụ ọhụrụ, mmelite koodu, na dọkumenti. Gụọ model ọbụla's GitHub repository maka ntuziaka mmepe na nsogbu na-arụ ọrụ.

Load models on demand and unload when idle to share GPU memory. Our GPU server runs 20+ models on 4x Tesla P40 (96GB total VRAM) using dynamic loading. For self-hosting, a single 24GB GPU can serve 3-5 models concurrently.

Modelsdị ọtụtụ na-enye Docker ngosi ma ọ bụ Dockerfiles. Maka ịrụ ọrụ ụdị dị iche iche, ị nwere ike ịmepụta Docker emeredịkachọrọ na NVIDIA Container Toolkit maka GPU access. API server architecture anyị nwere ike ịrụ ọrụ dịka nrụnye ntọala.

Models ndị kasị mkpa Python 3.10-3.12. Coqui TTS (VITS) na-achọ Python 3.11. Anyị na-atụ aro Python 3.12 maka ọtụtụ models. Chekwaa requirements.txt nke model ọ bụla maka n'eziokwu nke nsụgharị.

Ee. MIT na Apache 2.0 ikike na-enye ohere iji ọrụ azụmahịa. I nwere ike ịmepụta SaaS ngwaahịa, ngwa ngwa ekwentị, egwuregwu, na ọrụ iji ụdị ndị a na-enweghị ụgwọ ikike, royalties, ma ọ bụ ihe ndị chọrọ (ọ bụ ezie na attribution bụ ịrịba ama).
5.0/5 (1)

Gịnị ka anyị ga-eme ka ọ dịrị mma? Ntụziaka gị na-enyere anyị aka idozi nsogbu.

Jiri TTS oghe emeredịkachọrọ taa

20+ open-source models, niile commercially-lisensized. Jikwaa API anyị ma ọ bụ self-host - nhọrọ bụ gị.