Ukufaka idokhumende

I-TTS.ai ixhunywe kumasevisi akho nge-REST API yethu. Ifomethi ehambisana ne-OpenAI yokufuduka okulula.

I-REST API OpenAI ehambisanayo Ukuphendula kwe-JSON Usizo lokusakaza

Umbono

I-TTS.ai API inikeza ukufinyelela kwe-programmic kuzo zonke izici ze-platform: ukudweba kwe-text-to-speech, ukudweba kwe-speech-to-text, ukuklonya kwezwi, ukuphuculwa kwe-audio, nokunye okuningi. I-API isebenzisa izivumelwano ze-REST ezijwayelekile nge-JSON request/response bodies.

Isithonjana se-API

Thola isithonjana sakho se-API kusuka Izilungiselelo ze-akhawunti. Kutholakala ku-Pro ne-Enterprise plans.

Isisekelo se-URL

https://api.tts.ai/v1/

Ukugunyazwa

I-bearer token nge Authorization okuphezulu

Ukuqinisekiswa

Izinga elikhululekile — akukho inkinobho edingekayo. U-Anonymous POSTs ku /v1/tts/ kusebenza ngaphandle kwe-auth, kuze kube ngu-5,000 amaphawu/izinsuku nge-IP, usebenzisa noma iyiphi yemodeli yethu emahhala (piper, vits, melotts, kokoro). Bhala i-akhawunti emahhala ukuze uthole ama-15,000 ama-bonus characters kanye nokungena kumamodeli aphezulu.

Imodeli ye-premium nemikhawulo yezinga eliphakeme, qinisekisa nge-Bearer token ku Authorization okuphezulu.

Isihloko se-HTTP
Authorization: Bearer sk-tts-your-api-key-here
Gcina isithonjana sakho se-API sifihlakele. Ungayihlukanisi nekhowudi yekhasimende, ama-repositories kamphakathi, noma ama-logs. Jikelezisa izinkinobho njalo kusuka kuma-settings akho we-akhawunti.

I-SDKs

I-SDK esemthethweni ikwenza kube lula ukuxhuma i-TTS.ai kusicelo sakho. Zonke zivulekile futhi zitholakala ku-GitHub.

Python

pip install ttsai
from tts_ai import TTSClient

client = TTSClient(api_key="sk-tts-...")
audio = client.generate(
    text="Hello world!",
    model="kokoro"
)
client.save(audio, "output.wav")
GitHub

JavaScript / Node.js

npm install @ttsainpm/ttsai
const { TTSClient } = require('@ttsainpm/ttsai');

const client = new TTSClient({
  apiKey: 'sk-tts-...'
});
const audio = await client.generate({
  input: 'Hello world!',
  model: 'kokoro'
});
await client.saveToFile(audio, 'output.wav');
GitHub

Isisekelo se-URL

Isisekelo se-URL: https://api.tts.ai/v1/

Zonke iziqongo zihlobene nale-URL eyinhloko. Umzekelo, isiqongo se-TTS siwu:

POST https://api.tts.ai/v1/tts/

Amaphesenti

Imingcele yesilinganiso se-API ihluka ngokwe-plan:

I-Plan Izicelo/imini I-Concurrent Ubude obuphezulu bombhalo
Ikhululekile 10 2 Amaphawu angama-500
Isiqalisi 30 3 1,000,000 characters
I-Pro 60 5 1,000,000 characters
Ibhizinisi 300 20 50,000 characters

Isihloko somkhawulo wezinga siqukiwe kuwo wonke umlayezo: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.

Izindleko zekhedi

Izinsizakalo Izindleko Iyunithi
TTS (Amamodeli amahhala: Piper, VITS, MeloTTS) 1,000 characters ngamagama angama-1,000
TTS (Amamodeli ajwayelekile: Kokoro, CosyVoice 2, njll.) 2,000 amaphawu ngamagama angama-1,000
TTS (Amamodeli aphezulu: Tortoise, Chatterbox, njll.) 4,000 amaphawu ngamagama angama-1,000
Ukukhuluma kuMbhalo 2,000 amaphawu ihora ngalinye lomsindo
Ukulungiswa kwezwi 4,000 amaphawu ngamagama angama-1,000
Umshintshi womsindo 3,000 amaphawu ihora ngalinye lomsindo
Ukuthuthukiswa komsindo 2,000 amaphawu ihora ngalinye lomsindo
Ukususa umsindo / Ukuhlukanisa umsindo 3,000-4,000 characters ihora ngalinye lomsindo
Ukuhumusha kwezwi 5,000 amaphawu ihora ngalinye lomsindo
Izingxoxo zomsindo 3,000 amaphawu nge-turn
Iqhosha le-BPM Finder Ikhululekile --
Umguquli womsindo Ikhululekile --

Umbhalo usuka kumazwi

POST /v1/tts/

Guqula umbhalo ube umsindo wokukhuluma. Ibuyisela ihele lomsindo ngefomethi ecelwe.

Isiqu sesicelo

AmapharamithaUhloboKudingekaIncazelo
model string Akukho Imodeli ID (isibonelo, kokoro, chatterbox, piper). Uma ilahlekile, sizokhetha ngokuzenzakalela imodeli exhasa i language ecelwe — kokoro ye-en/ja/zh/ko/fr/de/it/pt/es/hi/ru, piper yezinye izilimi ezixhasiwe (ar/pl/nl/cs/da/fi/el/hu/tr/uk/vi/etc.).
text string Yebo Umbhalo oguqulwe ube ulwimi. Isicelo ngasinye sifinyelela: 500 amaphawu (angaziwayo), 5,000 (i-akhawunti emahhala), 1,000,000 (i-akhawunti ekhokhelwayo). Izingeniso ezide zihlukaniswa ngokuzenzakalela ngakwesokudla somhlinzeki.
voice string Yebo Umsindo ID (sebenzisa /v1/voices/ ukudweba umsindo okhona)
format string Akukho Ifomethi yesingeniso: mp3 (iphutha), wav, flac, ogg
speed float Akukho Uku multiplier isivinini sokukhuluma. Okuzenzakalelayo: 1.0. Uluhlu: 0.5 kuya ku 2.0
language string Akukho Umbhalo ofingqiwe wesilimi (isibonelo, en, es). Uzotholakala ngokuzenzakalela uma ushiyekile.
instructions string Akukho Ukuphatha / ukuthunyelwa kwe-cues (≤500 chars). eg. \
pronunciations object | array Akukho Ubizo-lucela ukumiswa. Noma {\
stream boolean Akukho Vumela umlayezo wokusakaza. Okuzenzakalelayo: false

Isibonelo sezicelo

cURL
curl -X POST https://api.tts.ai/v1/tts/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "text": "Hello from TTS.ai! This is a test.",
    "voice": "af_bella",
    "format": "mp3"
  }' \
  --output output.mp3

Amathegi we-SSML

Inombolo yokufaka, usuku, imali, inani lefoni, kanye negama elincane

chazaIsingenisoIkhuluma njenge
cardinal1234one thousand two hundred thirty-four
ordinal21twenty-first
date1999-12-31December thirty-first, nineteen ninety-nine
time14:30two thirty PM
telephone+1-555-867-5309plus one five five five eight six seven…
currency$1,234.56one thousand two hundred thirty-four dollars and fifty-six cents
spell-outNASAN A S A

Uhlobo lwemininingwane yemininingwane yemininingwane mdy ngesiNgisi no dmy kwezinye izindawo; cindezela nge format=\

Isibonelo
{
  "model": "kokoro",
  "voice": "af_bella",
  "text": "Your appointment is on <say-as interpret-as=\"date\">2026-04-26</say-as> at <say-as interpret-as=\"time\">14:30</say-as>. Please call <say-as interpret-as=\"telephone\">+1-555-867-5309</say-as> if you need to reschedule."
}

Umlayezo

The TTS endpoint queues your request and returns a JSON response with a job UUID. You then poll for the result.

Step 1: Submit request

Response (JSON)
{
  "uuid": "77b71db532874ce98e84a69a2d740d4c",
  "job_id": "f21316bb-aefa-480d-8523-701d1e3184ce",
  "status": "queued",
  "credits_used": 11,
  "credits_remaining": 15000
}

Step 2: Poll for result

GET /v1/speech/results/?uuid=<job_uuid>

Poll this endpoint every 1-2 seconds until status is completed or failed.

Polling response (completed)
{
  "status": "completed",
  "result_url": "https://api.tts.ai/static/downloads/77b71db5.../output.mp3"
}
Polling response (still processing)
{
  "status": "processing"
}

Step 3: Download audio

Fetch the result_url from the completed response to download the audio file.

Full example

Python
import requests, time

API_KEY = "sk-tts-your-key"
BASE = "https://api.tts.ai"

# 1. Submit TTS request
resp = requests.post(f"{BASE}/v1/tts/", json={
    "model": "kokoro",
    "text": "Hello from TTS.ai!",
    "voice": "af_bella"
}, headers={"Authorization": f"Bearer {API_KEY}"})
data = resp.json()
uuid = data["uuid"]

# 2. Poll for result
while True:
    result = requests.get(f"{BASE}/v1/speech/results/",
        params={"uuid": uuid}).json()
    if result["status"] == "completed":
        # 3. Download audio
        audio = requests.get(result["result_url"])
        with open("output.mp3", "wb") as f:
            f.write(audio.content)
        break
    elif result["status"] == "failed":
        raise Exception(result.get("error", "Generation failed"))
    time.sleep(1.5)

Streaming alternative: For supported models (Kokoro, MeloTTS), use POST /v1/tts/stream/ for real-time Server-Sent Events (SSE) streaming — no polling needed.

Ukukhuluma kuMbhalo

POST /v1/stt/

Gcwalisa umsindo ube ngumbhalo. Ixhasa izilimi ezingu-99 nge-auto-detection.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Ihele lomsindo (MP3, WAV, FLAC, OGG, M4A, MP4, WebM). Max 100MB.
model string Akukho Imodeli ye-STT: whisper (iphutha), faster-whisper, sensevoice
language string Akukho Umbhalo ofingqiwe wesilimi. auto ukuhlola okuzenzakalelayo (okuzenzakalelayo).
timestamps boolean Akukho Kuhlanganise negama-level timestamps. Okuzenzakalelayo: false
diarize boolean Akukho Vumela isiqophi somsindo. Okuzenzakalelayo: false

Umlayezo

Umlayezo we-JSON
{
  "text": "Hello, this is a transcription test.",
  "language": "en",
  "duration": 3.5,
  "segments": [
    {
      "start": 0.0,
      "end": 1.8,
      "text": "Hello, this is",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 1.8,
      "end": 3.5,
      "text": "a transcription test.",
      "speaker": "SPEAKER_00"
    }
  ]
}

Ukulungiswa kwezwi

POST /v1/tts/clone/

Yenza umlayezo ngezwi elilodwa. Layisha phezulu umlayezo we-audio ne-text.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
reference_audio file Yebo Ucingo lomsindo wokubonisa (imizuzu engu-10-30 ivunyelwe). Ubuningi 20MB.
text string Yebo Umbhalo okhuluma ngesibizo esiklonyelelwe.
model string Akukho Imodeli ye-clone: chatterbox (iphutha), cosyvoice2, gpt-sovits
format string Akukho Ifomethi yesingeniso: mp3 (iphutha), wav, flac
language string Akukho Ikhowudi yesilimi esithenjwa. Kudingeka ixhaswe yimodeli ekhethiwe.

Umlayezo

Ibuyisela ihele lomsindo njengedatha ye-binary, efanayo ne-TTS endpoint.

Umshintshi womsindo

POST /v1/voice-convert/

Guqula umsindo ube yizwi elihlukile. Layisha umsindo womsuka bese ukhetha umsindo ofuna ukuwuthola.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Umsuka wehele lomsindo (MP3, WAV, FLAC). Ubuningi 50MB.
target_voice string Yebo I-ID yomsindo ofuna ukuwuguqula (sebenzisa /v1/voices/ ukudweba imisindo ekhona)
model string Akukho Imodeli yokuguqulwa kwezwi: openvoice (iphutha), knn-vc
format string Akukho Ifomethi yokuphuma: wav (iphutha), mp3, flac

Isibonelo sezicelo

cURL
curl -X POST https://api.tts.ai/v1/voice-convert/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@source_audio.mp3" \
  -F "target_voice=af_bella" \
  -F "model=openvoice" \
  -o converted.wav

Umlayezo

Ibuyisela ihele lomsindo eliguqulwe njengedatha ye-binary.

Ukuhumusha kwezwi

POST /v1/speech-translate/

Guqula umsindo okhulumayo kusuka kulesilimi kuye kwesinye. Ihlanganisa ukukhuluma-nokubhala, ukuhumusha, nokubhala-nokukhuluma kunoma yini eyodwa.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Ihele lomsindo lomsuka ngesilimi sakuqala. Ubude obuphezulu 100MB.
target_language string Yebo Umbhalo ofingqiwe wesizinda solimi (isibonelo, es, fr, de, ja)
voice string Akukho Umsindo we-output eguqulwe. Ikhethiwe ngokuzenzakalela uma ihlehliswa.
preserve_voice boolean Akukho Zama ukugcina izimo zoเสียงe zomsindo omusha. Iphutha: false

Umlayezo

Umlayezo we-JSON
{
  "original_text": "Hello, how are you?",
  "translated_text": "Hola, como estas?",
  "source_language": "en",
  "target_language": "es",
  "audio_url": "https://api.tts.ai/v1/results/translate_abc123.mp3",
  "credits_used": 5
}

Ukukhuluma ku-Ukukhuluma

POST /v1/speech-to-speech/

Guqula indlela yokukhuluma, inkulumo, noma ukuthunyelwa ngenkathi ugcina okuqukethwe. Kusetshenziswa ukuhlela umsindo, ukukhawulela, nokuveza.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Umthombo wehele lomsindo lomsindo. Max 50MB.
voice string Yebo I-ID yomsindo ofuna ukuwuthola ku-output speech
model string Akukho Imodeli: openvoice (iphutha), chatterbox
emotion string Akukho Inhloso yomqondo: neutral, happy, sad, angry, excited
speed float Akukho Ukuhlela isivinini. Okuzenzakalelayo: 1.0. Uluhlu: 0.5 kuya ku 2.0

Umlayezo

Ibuyisela ihele lomsindo eliguqulwe njengedatha ye-binary.

Amathuluzi omsindo

Izinhlamvu zokuphatha umsindo zokuthuthukisa, ukususa umsindo, ukuhlukaniswa kwe-stem, njll.

POST /v1/audio/enhance/

Nciphisa ukhwalithi yomsindo: khulula umsindo, thuthukisa ukucacile, sinqumo esiphezulu.

file fileIhele lomsindo okufanele lithuthukiswe
denoise booleanVumela i-denoise (iphutha: yiqiniso)
enhance_clarity booleanNciphisa ukucacile kokukhuluma (iphutha: yiqiniso)
super_resolution booleanIzinga lomgangatho womsindo (okungajwayelekile: akulungile)
strength integer1-3 (ekhanyayo, ephakathi, enamandla). Okuzenzakalelayo: 2
POST /v1/audio/separate/

Yakha izingoma eziphuma ezisetshenziswani (ukususa izingoma) noma uhlukanise zibe izihlahla.

file fileIhele lomsindo ofuna ukulihlukanisa
model stringdemucs (iphutha) noma spleeter
stems integerInani lezinhlamvu: 2, 4, 5, noma 6 (isimiso: 2)
format stringIfomethi yesingeniso: wav, mp3, flac
POST /v1/audio/dereverb/

Susa i-echo ne-reverb kusuka ku-audio recordings.

file fileIhele lomsindo olungenziwa
type stringecho or reverb (default: both)
intensity integer1-5 (default: 3)
POST /v1/audio/analyze/ Ikhululekile

Hlola umsindo ukuze ubone isithonjana, i-BPM, kanye nesikhathi sokufaka isitifiketi.

file fileIhele lomsindo ofuna ukulihlolisa
Umlayezo
{
  "key": "C",
  "scale": "Major",
  "bpm": 120.0,
  "time_signature": "4/4",
  "camelot": "8B",
  "compatible_keys": ["C Major", "G Major", "F Major", "A Minor"]
}
POST /v1/audio/convert/ Ikhululekile

Guqula umsindo phakathi kwefomethi.

file fileIhele lomsindo ofuna ukuliguqula
format stringIfomethi efanele: mp3, wav, flac, ogg, m4a, aac
bitrate integerI-bitrate yesikhishwayo ku-kbps: 64, 128, 192, 256, 320
sample_rate integerIsibalo sesampula: 22050, 44100, 48000
channels stringmono noma stereo

Izingxoxo zomsindo

POST /v1/voice-chat/

Thumela umsindo noma umbhalo bese uthola umlayezo we-AI ngezwi elihlanganisiwe.

Isiqu sesicelo (multipart/form-data noma JSON)

AmapharamithaUhloboKudingekaIncazelo
audio file Akukho* Isingeniso somsindo (noma audio noma text kudingeka)
text string Akukho* Isingeniso sombhalo (noma audio noma text kudingeka)
voice string Akukho Umsindo wokuphendula kwe-AI. Okuzenzakalelayo: af_bella
tts_model string Akukho Imodeli ye-TTS yokuphendula. Okuzenzakalelayo: kokoro
system_prompt string Akukho Isimo esizenzakalelayo sombuzo we-AI
conversation_id string Akukho Qhubeka nezingxoxo ezikhona

Umlayezo

Umlayezo we-JSON
{
  "conversation_id": "conv_abc123",
  "user_text": "What is the capital of France?",
  "ai_text": "The capital of France is Paris.",
  "audio_url": "https://api.tts.ai/v1/audio/tmp/resp_xyz.mp3",
  "credits_used": 3
}

I-TTS ye-batch

POST /v1/tts/batch/

Sithumela imibhalo eminingi yokuzaliseka kwe-TTS. Uma ufuna, thola umlayezo we-webhook uma zonke imisebenzi iqediwe.

Amapharamitha

AmapharamithaUhloboIncazelo
textsarrayArray of objects: {text, model, voice}. Max 50 items.
webhook_urlstringOptional URL to POST results when batch completes.

Umlayezo

Umlayezo we-JSON
{
  "batch_id": "abc123",
  "total": 3,
  "completed": 0,
  "status": "processing"
}

Uqhubekeko lwe-poll nge-GET /v1/tts/batch/result/?batch_id=abc123

Ukungenisa umsindo

POST /v1/voice-embed/

Ibala ngaphambi kokufaka umsindo kusuka ku-reference audio. Sebenzisa i-embed_id ebuyiselwe ku-requests ye-cloning yomsindo olandelayo ukudala okunokwenzeka.

Amapharamitha

AmapharamithaUhloboIncazelo
filefileReference audio file (WAV, MP3, FLAC).
modelstringCloning model (default: chatterbox). Supported: chatterbox, cosyvoice2, openvoice, gpt-sovits, spark, indextts2, qwen3-tts.

Umlayezo

Umlayezo we-JSON
{
  "embed_id": "emb_abc123",
  "model": "chatterbox",
  "duration_ms": 450
}

Ukuhlolwa kwempilo

GET /v1/health/

Khangela isimo somhlinzeki we-GPU, amamodeli alayishiwe, kanye nobukhulu befolo. Akukho bufakazi obudingekayo. Kugcinwe isikhathi samasekondi angama-30.

Umlayezo

Umlayezo we-JSON
{
  "status": "online",
  "latency_ms": 45,
  "queue_size": 3,
  "models_loaded": ["kokoro", "chatterbox", "cosyvoice2"]
}

Hlela amamodeli

GET /v1/models/

Ibuyisela uhlu lwawo wonke amamodeli atholakalayo nekhono labo.

Umlayezo

Umlayezo we-JSON
{
  "models": [
    {
      "id": "kokoro",
      "name": "Kokoro",
      "type": "tts",
      "tier": "standard",
      "languages": ["en", "ja", "ko", "zh", "fr"],
      "supports_cloning": false,
      "supports_streaming": true,
      "credits_per_1k_chars": 2
    },
    {
      "id": "chatterbox",
      "name": "Chatterbox",
      "type": "tts",
      "tier": "premium",
      "languages": ["en"],
      "supports_cloning": true,
      "supports_streaming": true,
      "credits_per_1k_chars": 4
    }
  ]
}

Uhlu lwamagama

GET /v1/voices/

Ibuyisela uhlu lwazo zonke izizwi ezikhona, ezihlanganisiwe ngemodeli noma ulwimi.

Ipharamitha yombuzo

AmapharamithaUhloboIncazelo
model string Isihlungi ngemodeli ID (e.g., kokoro)
language string Isihlungi ngekhodi yesilimi (isibonelo, en)
gender string Isihlungi ngokwesondo: male, female, neutral

Umlayezo

Umlayezo we-JSON
{
  "voices": [
    {
      "id": "af_bella",
      "name": "Bella",
      "model": "kokoro",
      "language": "en",
      "gender": "female",
      "preview_url": "https://api.tts.ai/v1/voices/preview/af_bella.mp3"
    }
  ],
  "total": 142
}

Izihloko ezingezansi (SRT / VTT) entsha

GET /v1/speech/subtitles/?uuid=<job_uuid>&format=srt|vtt&download=1

Dala izihloko ezihambisanayo nganoma iyiphi imisebenzi ye-TTS eqediwe. Iqhuba ukulinganisa kwe-Whisper phezu kwesandi futhi ibuyisela i-SRT noma i-WebVTT. Imiphumela igcinwa endaweni yokufihla kwidiski ukuze umlayezo wesithathu we-uuid efanayo ube yidiski yokufundwa.

Ipharamitha yombuzo

AmapharamithaKudingekaIncazelo
uuidYeboImisebenzi UUID ebuyiselwe ngu /v1/tts/ noma /v1/voice-clone/.
formatAkukhosrt (iphutha) noma vtt.
downloadAkukho1 ukuthumela Isihloko-Uhlelo: isifaki ukuze isiphequluli sigcine endaweni yokukhombisa.
languageAkukhoUsizo lwemodeli yokulinganisa (itholakale ngokuzenzakalela uma ilahlekile).
cURL
curl "https://api.tts.ai/v1/speech/subtitles/?uuid=$UUID&format=srt&download=1" -o subtitles.srt

Incwadi yegama entsha

GET POST DELETE /api/v1/pronunciations/

Ukhuluma kanjani ngenjini ye-TTS. Izingeniso ezigcinwe zisetshenziswa ngokuzenzakalela kunoma yisiphi isicelo se-TTS osenza. Ukungena kwe-200 kungakapheli i-akhawunti ngayinye.

Isiqu sesicelo (POST)

AmapharamithaUhloboIncazelo
wordstringIgama elizodlula (isibonelo GIF, Anthropic). Umkhawulo wegama ufana.
replacementstringIndlela yokuyibhala ngemodeli (isibonelo jiff, ann THROP ick).
languagestringIkhowudi ye-ISO ekhethiwe. Engenanto = isebenza kuwo wonke amagama.
case_sensitivebooleanIphutha ubuxoki. Uhlobo lwegama lufana ngokugcwele uma yinyani.
cURL
# Save an entry
curl -X POST https://tts.ai/api/v1/pronunciations/ \
  -H "Authorization: Bearer sk-tts-..." \
  -H "Content-Type: application/json" \
  -d '{"word": "GIF", "replacement": "jiff"}'

# List your entries
curl https://tts.ai/api/v1/pronunciations/ -H "Authorization: Bearer sk-tts-..."

# Delete entry by id
curl -X DELETE "https://tts.ai/api/v1/pronunciations/?id=42" -H "Authorization: Bearer sk-tts-..."

Ungadlulisa futhi izicelo ezingaphezu kwezinga-1 ngaphandle kokuzigcina — hlanganisa izinhlamvu nganoma iyiphi /v1/tts/ inkulumo njengento noma i-array (bona i-TTS endpoint params).

Umbhali wesihloko entsha

Lahla i-