Ukufaka idokhumende

I-TTS.ai ixhunywe kumasevisi akho nge-REST API yethu. Ifomethi ehambisana ne-OpenAI yokufuduka okulula.

I-REST API OpenAI ehambisanayo Ukuphendula kwe-JSON Usizo lokusakaza

Umbono

I-TTS.ai API inikeza ukufinyelela kwe-programmic kuzo zonke izici ze-platform: ukudweba kwe-text-to-speech, ukudweba kwe-speech-to-text, ukuklonya kwezwi, ukuphuculwa kwe-audio, nokunye okuningi. I-API isebenzisa izivumelwano ze-REST ezijwayelekile nge-JSON request/response bodies.

Isithonjana se-API

Thola isithonjana sakho se-API kusuka Izilungiselelo ze-akhawunti. Kutholakala ku-Pro ne-Enterprise plans.

Isisekelo se-URL

https://api.tts.ai/v1/

Ukugunyazwa

I-bearer token nge Authorization okuphezulu

Ukuqinisekiswa

Zonke izicelo ze-API zidinga ukuqinisekiswa nge-Bearer token ku- Authorization okuphezulu.

Isihloko se-HTTP
Authorization: Bearer sk-tts-your-api-key-here
Gcina isithonjana sakho se-API sifihlakele. Ungayihlukanisi nekhowudi yekhasimende, ama-repositories kamphakathi, noma ama-logs. Jikelezisa izinkinobho njalo kusuka kuma-settings akho we-akhawunti.

I-SDKs

I-SDK esemthethweni ikwenza kube lula ukuxhuma i-TTS.ai kusicelo sakho. Zonke zivulekile futhi zitholakala ku-GitHub.

Python

pip install ttsai
from tts_ai import TTSClient

client = TTSClient(api_key="sk-tts-...")
audio = client.generate(
    text="Hello world!",
    model="kokoro"
)
client.save(audio, "output.wav")
GitHub

JavaScript / Node.js

npm install @ttsainpm/ttsai
const { TTSClient } = require('@ttsainpm/ttsai');

const client = new TTSClient({
  apiKey: 'sk-tts-...'
});
const audio = await client.generate({
  input: 'Hello world!',
  model: 'kokoro'
});
await client.saveToFile(audio, 'output.wav');
GitHub

Isisekelo se-URL

Isisekelo se-URL: https://api.tts.ai/v1/

Zonke iziqongo zihlobene nale-URL eyinhloko. Umzekelo, isiqongo se-TTS siwu:

POST https://api.tts.ai/v1/tts/

Amaphesenti

Imingcele yesilinganiso se-API ihluka ngokwe-plan:

I-Plan Izicelo/imini I-Concurrent Ubude obuphezulu bombhalo
Ikhululekile 10 2 Amaphawu angama-500
Isiqalisi 30 3 100,000 characters
I-Pro 60 5 100,000 characters
Ibhizinisi 300 20 50,000 characters

Isihloko somkhawulo wezinga siqukiwe kuwo wonke umlayezo: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.

Izindleko zekhedi

Izinsizakalo Izindleko Iyunithi
TTS (Amamodeli amahhala: Piper, VITS, MeloTTS) 1,000 characters ngamagama angama-1,000
TTS (Amamodeli ajwayelekile: Kokoro, CosyVoice 2, njll.) 2,000 amaphawu ngamagama angama-1,000
TTS (Amamodeli aphezulu: Tortoise, Chatterbox, njll.) 4,000 amaphawu ngamagama angama-1,000
Ukukhuluma kuMbhalo 2,000 amaphawu ihora ngalinye lomsindo
Ukulungiswa kwezwi 4,000 amaphawu ngamagama angama-1,000
Umshintshi womsindo 3,000 amaphawu ihora ngalinye lomsindo
Ukuthuthukiswa komsindo 2,000 amaphawu ihora ngalinye lomsindo
Ukususa umsindo / Ukuhlukanisa umsindo 3,000-4,000 characters ihora ngalinye lomsindo
Ukuhumusha kwezwi 5,000 amaphawu ihora ngalinye lomsindo
Izingxoxo zomsindo 3,000 amaphawu nge-turn
Iqhosha le-BPM Finder Ikhululekile --
Umguquli womsindo Ikhululekile --

Umbhalo usuka kumazwi

POST /v1/tts/

Guqula umbhalo ube umsindo wokukhuluma. Ibuyisela ihele lomsindo ngefomethi ecelwe.

Isiqu sesicelo

AmapharamithaUhloboKudingekaIncazelo
model string Yebo Imodeli ID (isibonelo, kokoro, chatterbox, piper)
text string Yebo Umbhalo oguqulwe ube ulwimi (max 100,000 characters per request)
voice string Yebo Umsindo ID (sebenzisa /v1/voices/ ukudweba umsindo okhona)
format string Akukho Ifomethi yesingeniso: mp3 (iphutha), wav, flac, ogg
speed float Akukho Uku multiplier isivinini sokukhuluma. Okuzenzakalelayo: 1.0. Uluhlu: 0.5 kuya ku 2.0
language string Akukho Umbhalo ofingqiwe wesilimi (isibonelo, en, es). Uzotholakala ngokuzenzakalela uma ushiyekile.
stream boolean Akukho Vumela umlayezo wokusakaza. Okuzenzakalelayo: false

Isibonelo sezicelo

cURL
curl -X POST https://api.tts.ai/v1/tts/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "text": "Hello from TTS.ai! This is a test.",
    "voice": "af_bella",
    "format": "mp3"
  }' \
  --output output.mp3

Umlayezo

Ibuyisela ihele lomsindo njengedatha ye-binary nge Isihloko-Sohlobo okufanele (audio/mpeg, audio/wav, njll.).

Izinhlamvu eziphezulu zekhasi
Content-Type: audio/mpeg
Content-Length: 48256
X-Credits-Used: 2
X-Credits-Remaining: 498

Ukukhuluma kuMbhalo

POST /v1/stt/

Gcwalisa umsindo ube ngumbhalo. Ixhasa izilimi ezingu-99 nge-auto-detection.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Ihele lomsindo (MP3, WAV, FLAC, OGG, M4A, MP4, WebM). Max 100MB.
model string Akukho Imodeli ye-STT: whisper (iphutha), faster-whisper, sensevoice
language string Akukho Umbhalo ofingqiwe wesilimi. auto ukuhlola okuzenzakalelayo (okuzenzakalelayo).
timestamps boolean Akukho Kuhlanganise negama-level timestamps. Okuzenzakalelayo: false
diarize boolean Akukho Vumela isiqophi somsindo. Okuzenzakalelayo: false

Umlayezo

Umlayezo we-JSON
{
  "text": "Hello, this is a transcription test.",
  "language": "en",
  "duration": 3.5,
  "segments": [
    {
      "start": 0.0,
      "end": 1.8,
      "text": "Hello, this is",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 1.8,
      "end": 3.5,
      "text": "a transcription test.",
      "speaker": "SPEAKER_00"
    }
  ]
}

Ukulungiswa kwezwi

POST /v1/tts/clone/

Yenza umlayezo ngezwi elilodwa. Layisha phezulu umlayezo we-audio ne-text.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
reference_audio file Yebo Ucingo lomsindo wokubonisa (imizuzu engu-10-30 ivunyelwe). Ubuningi 20MB.
text string Yebo Umbhalo okhuluma ngesibizo esiklonyelelwe.
model string Akukho Imodeli ye-clone: chatterbox (iphutha), cosyvoice2, gpt-sovits
format string Akukho Ifomethi yesingeniso: mp3 (iphutha), wav, flac
language string Akukho Ikhowudi yesilimi esithenjwa. Kudingeka ixhaswe yimodeli ekhethiwe.

Umlayezo

Ibuyisela ihele lomsindo njengedatha ye-binary, efanayo ne-TTS endpoint.

Umshintshi womsindo

POST /v1/voice-convert/

Guqula umsindo ube yizwi elihlukile. Layisha umsindo womsuka bese ukhetha umsindo ofuna ukuwuthola.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Umsuka wehele lomsindo (MP3, WAV, FLAC). Ubuningi 50MB.
target_voice string Yebo I-ID yomsindo ofuna ukuwuguqula (sebenzisa /v1/voices/ ukudweba imisindo ekhona)
model string Akukho Imodeli yokuguqulwa kwezwi: openvoice (iphutha), knn-vc
format string Akukho Ifomethi yokuphuma: wav (iphutha), mp3, flac

Isibonelo sezicelo

cURL
curl -X POST https://api.tts.ai/v1/voice-convert/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@source_audio.mp3" \
  -F "target_voice=af_bella" \
  -F "model=openvoice" \
  -o converted.wav

Umlayezo

Ibuyisela ihele lomsindo eliguqulwe njengedatha ye-binary.

Ukuhumusha kwezwi

POST /v1/speech-translate/

Guqula umsindo okhulumayo kusuka kulesilimi kuye kwesinye. Ihlanganisa ukukhuluma-nokubhala, ukuhumusha, nokubhala-nokukhuluma kunoma yini eyodwa.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Ihele lomsindo lomsuka ngesilimi sakuqala. Ubude obuphezulu 100MB.
target_language string Yebo Umbhalo ofingqiwe wesizinda solimi (isibonelo, es, fr, de, ja)
voice string Akukho Umsindo we-output eguqulwe. Ikhethiwe ngokuzenzakalela uma ihlehliswa.
preserve_voice boolean Akukho Zama ukugcina izimo zoเสียงe zomsindo omusha. Iphutha: false

Umlayezo

Umlayezo we-JSON
{
  "original_text": "Hello, how are you?",
  "translated_text": "Hola, como estas?",
  "source_language": "en",
  "target_language": "es",
  "audio_url": "https://api.tts.ai/v1/results/translate_abc123.mp3",
  "credits_used": 5
}

Ukukhuluma ku-Ukukhuluma

POST /v1/speech-to-speech/

Guqula indlela yokukhuluma, inkulumo, noma ukuthunyelwa ngenkathi ugcina okuqukethwe. Kusetshenziswa ukuhlela umsindo, ukukhawulela, nokuveza.

Isiqu sesicelo (multipart/form-data)

AmapharamithaUhloboKudingekaIncazelo
file file Yebo Umthombo wehele lomsindo lomsindo. Max 50MB.
voice string Yebo I-ID yomsindo ofuna ukuwuthola ku-output speech
model string Akukho Imodeli: openvoice (iphutha), chatterbox
emotion string Akukho Inhloso yomqondo: neutral, happy, sad, angry, excited
speed float Akukho Ukuhlela isivinini. Okuzenzakalelayo: 1.0. Uluhlu: 0.5 kuya ku 2.0

Umlayezo

Ibuyisela ihele lomsindo eliguqulwe njengedatha ye-binary.

Amathuluzi omsindo

Izinhlamvu zokuphatha umsindo zokuthuthukisa, ukususa umsindo, ukuhlukaniswa kwe-stem, njll.

POST /v1/audio/enhance/

Nciphisa ukhwalithi yomsindo: khulula umsindo, thuthukisa ukucacile, sinqumo esiphezulu.

file fileIhele lomsindo okufanele lithuthukiswe
denoise booleanVumela i-denoise (iphutha: yiqiniso)
enhance_clarity booleanNciphisa ukucacile kokukhuluma (iphutha: yiqiniso)
super_resolution booleanIzinga lomgangatho womsindo (okungajwayelekile: akulungile)
strength integer1-3 (ekhanyayo, ephakathi, enamandla). Okuzenzakalelayo: 2
POST /v1/audio/separate/

Yakha izingoma eziphuma ezisetshenziswani (ukususa izingoma) noma uhlukanise zibe izihlahla.

file fileIhele lomsindo ofuna ukulihlukanisa
model stringdemucs (iphutha) noma spleeter
stems integerInani lezinhlamvu: 2, 4, 5, noma 6 (isimiso: 2)
format stringIfomethi yesingeniso: wav, mp3, flac
POST /v1/audio/dereverb/

Susa i-echo ne-reverb kusuka ku-audio recordings.

file fileIhele lomsindo olungenziwa
type stringecho or reverb (default: both)
intensity integer1-5 (default: 3)
POST /v1/audio/analyze/ Ikhululekile

Hlola umsindo ukuze ubone isithonjana, i-BPM, kanye nesikhathi sokufaka isitifiketi.

file fileIhele lomsindo ofuna ukulihlolisa
Umlayezo
{
  "key": "C",
  "scale": "Major",
  "bpm": 120.0,
  "time_signature": "4/4",
  "camelot": "8B",
  "compatible_keys": ["C Major", "G Major", "F Major", "A Minor"]
}
POST /v1/audio/convert/ Ikhululekile

Guqula umsindo phakathi kwefomethi.

file fileIhele lomsindo ofuna ukuliguqula
format stringIfomethi efanele: mp3, wav, flac, ogg, m4a, aac
bitrate integerI-bitrate yesikhishwayo ku-kbps: 64, 128, 192, 256, 320
sample_rate integerIsibalo sesampula: 22050, 44100, 48000
channels stringmono noma stereo

Izingxoxo zomsindo

POST /v1/voice-chat/

Thumela umsindo noma umbhalo bese uthola umlayezo we-AI ngezwi elihlanganisiwe.

Isiqu sesicelo (multipart/form-data noma JSON)

AmapharamithaUhloboKudingekaIncazelo
audio file Akukho* Isingeniso somsindo (noma audio noma text kudingeka)
text string Akukho* Isingeniso sombhalo (noma audio noma text kudingeka)
voice string Akukho Umsindo wokuphendula kwe-AI. Okuzenzakalelayo: af_bella
tts_model string Akukho Imodeli ye-TTS yokuphendula. Okuzenzakalelayo: kokoro
system_prompt string Akukho Isimo esizenzakalelayo sombuzo we-AI
conversation_id string Akukho Qhubeka nezingxoxo ezikhona

Umlayezo

Umlayezo we-JSON
{
  "conversation_id": "conv_abc123",
  "user_text": "What is the capital of France?",
  "ai_text": "The capital of France is Paris.",
  "audio_url": "https://api.tts.ai/v1/audio/tmp/resp_xyz.mp3",
  "credits_used": 3
}

I-TTS ye-batch

POST /v1/tts/batch/

Sithumela imibhalo eminingi yokuzaliseka kwe-TTS. Uma ufuna, thola umlayezo we-webhook uma zonke imisebenzi iqediwe.

Amapharamitha

AmapharamithaUhloboIncazelo
textsarrayArray of objects: {text, model, voice}. Max 50 items.
webhook_urlstringOptional URL to POST results when batch completes.

Umlayezo

Umlayezo we-JSON
{
  "batch_id": "abc123",
  "total": 3,
  "completed": 0,
  "status": "processing"
}

Uqhubekeko lwe-poll nge-GET /v1/tts/batch/result/?batch_id=abc123

Ukungenisa umsindo

POST /v1/voice-embed/

Ibala ngaphambi kokufaka umsindo kusuka ku-reference audio. Sebenzisa i-embed_id ebuyiselwe ku-requests ye-cloning yomsindo olandelayo ukudala okunokwenzeka.

Amapharamitha

AmapharamithaUhloboIncazelo
filefileReference audio file (WAV, MP3, FLAC).
modelstringCloning model (default: chatterbox). Supported: chatterbox, cosyvoice2, openvoice, gpt-sovits, spark, indextts2, qwen3-tts.

Umlayezo

Umlayezo we-JSON
{
  "embed_id": "emb_abc123",
  "model": "chatterbox",
  "duration_ms": 450
}

Ukuhlolwa kwempilo

GET /v1/health/

Khangela isimo somhlinzeki we-GPU, amamodeli alayishiwe, kanye nobukhulu befolo. Akukho bufakazi obudingekayo. Kugcinwe isikhathi samasekondi angama-30.

Umlayezo

Umlayezo we-JSON
{
  "status": "online",
  "latency_ms": 45,
  "queue_size": 3,
  "models_loaded": ["kokoro", "chatterbox", "cosyvoice2"]
}

Hlela amamodeli

GET /v1/models/

Ibuyisela uhlu lwawo wonke amamodeli atholakalayo nekhono labo.

Umlayezo

Umlayezo we-JSON
{
  "models": [
    {
      "id": "kokoro",
      "name": "Kokoro",
      "type": "tts",
      "tier": "standard",
      "languages": ["en", "ja", "ko", "zh", "fr"],
      "supports_cloning": false,
      "supports_streaming": true,
      "credits_per_1k_chars": 2
    },
    {
      "id": "chatterbox",
      "name": "Chatterbox",
      "type": "tts",
      "tier": "premium",
      "languages": ["en"],
      "supports_cloning": true,
      "supports_streaming": true,
      "credits_per_1k_chars": 4
    }
  ]
}

Uhlu lwamagama

GET /v1/voices/

Ibuyisela uhlu lwazo zonke izizwi ezikhona, ezihlanganisiwe ngemodeli noma ulwimi.

Ipharamitha yombuzo

AmapharamithaUhloboIncazelo
model string Isihlungi ngemodeli ID (e.g., kokoro)
language string Isihlungi ngekhodi yesilimi (isibonelo, en)
gender string Isihlungi ngokwesondo: male, female, neutral

Umlayezo

Umlayezo we-JSON
{
  "voices": [
    {
      "id": "af_bella",
      "name": "Bella",
      "model": "kokoro",
      "language": "en",
      "gender": "female",
      "preview_url": "https://api.tts.ai/v1/voices/preview/af_bella.mp3"
    }
  ],
  "total": 142
}

Isibonelo sekhodi

Umbhalo usuka kumazwi

Python - requests
import requests

API_KEY = "sk-tts-your-key"

# Text to Speech
response = requests.post(
    "https://api.tts.ai/v1/tts/",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "kokoro",
        "text": "Hello from TTS.ai!",
        "voice": "af_bella",
        "format": "mp3"
    }
)

with open("output.mp3", "wb") as f:
    f.write(response.content)

print(f"Credits used: {response.headers.get('X-Credits-Used')}")

Ukukhuluma kuMbhalo

Python - requests
# Speech to Text
with open("recording.mp3", "rb") as f:
    response = requests.post(
        "https://api.tts.ai/v1/stt/",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": f},
        data={"model": "faster-whisper", "timestamps": "true"}
    )

result = response.json()
print(result["text"])

Ukulungiswa kwezwi

Python - requests
# Voice Cloning
with open("reference.wav", "rb") as ref:
    response = requests.post(
        "https://api.tts.ai/v1/tts/clone/",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"reference_audio": ref},
        data={
            "text": "This speech uses a cloned voice.",
            "model": "chatterbox"
        }
    )

with open("cloned_output.mp3", "wb") as f:
    f.write(response.content)

Umbhalo usuka kumazwi

JavaScript - fetch
const API_KEY = 'sk-tts-your-key';

// Text to Speech
const response = await fetch('https://api.tts.ai/v1/tts/', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'kokoro',
    text: 'Hello from TTS.ai!',
    voice: 'af_bella',
    format: 'mp3'
  })
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Ukukhuluma kuMbhalo

JavaScript - fetch
// Speech to Text
const formData = new FormData();
formData.append('file', audioFile);
formData.append('model', 'faster-whisper');

const response = await fetch('https://api.tts.ai/v1/stt/', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${API_KEY}` },
  body: formData
});

const result = await response.json();
console.log(result.text);

Umbhalo usuka kumazwi

cURL
# Text to Speech
curl -X POST https://api.tts.ai/v1/tts/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"kokoro","text":"Hello!","voice":"af_bella","format":"mp3"}' \
  -o output.mp3

Ukukhuluma kuMbhalo

cURL
# Speech to Text
curl -X POST https://api.tts.ai/v1/stt/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@recording.mp3" \
  -F "model=faster-whisper" \
  -F "timestamps=true"

Ukulungiswa kwezwi

cURL
# Voice Cloning
curl -X POST https://api.tts.ai/v1/tts/clone/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "reference_audio=@reference.wav" \
  -F "text=This uses a cloned voice." \
  -F "model=chatterbox" \
  -o cloned.mp3

Ukuthuthukiswa komsindo

cURL
# Audio Enhancement
curl -X POST https://api.tts.ai/v1/audio/enhance/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@noisy_audio.mp3" \
  -F "denoise=true" \
  -F "enhance_clarity=true" \
  -o enhanced.mp3

Amakhodi wephutha

Zonke izinkinga zibuyisela umlayezo we-JSON one- error isigaba.

Iphutha lefomethi lempendulo
{
  "error": {
    "code": "insufficient_credits",
    "message": "You do not have enough characters for this request.",
    "characters_required": 4000,
    "characters_available": 2000
  }
}
Isimo se HTTPIphutha lekhowudiIncazelo
400 bad_request Izinkomba zombuzo ezingasebenzi. Khangela umlayezo wephutha ngezinhlamvu.
401 unauthorized Isithonjana se-API esilahlekile noma esingasebenzi.
402 insufficient_credits Akunamagama aphelele. Thenga ngaphezu kwalokhu ku /pricing/.
403 forbidden Ukungena kwe-API akukhona kuhlelo lwakho.
404 not_found Imodeli noma umsindo awutholakali.
413 file_too_large Ihele elilayishiwe lidlula umkhawulo wobukhulu.
429 rate_limited Izicelo eziningi kakhulu. Khangela umkhawulo wezinga lokubhalwe emantla ephepha.
500 internal_error Iphutha lesisebenzisi. Zama futhi kamuva.
503 model_loading Imodeli ifaka. Zama futhi ngemuva kwemizuzu embalwa.

I-Webhooks

Imisebenzi esebenza isikhathi eside (ukuhlukaniswa kwe-stem, i-batch TTS), unganikeza i webhook_url parameter. Xa umsebenzi uqediwe, sizo-POST imiphumela ku-URL yakho.

I-Webhook Payload
{
  "event": "task.completed",
  "task_id": "task_abc123",
  "status": "success",
  "result_url": "https://api.tts.ai/v1/results/task_abc123",
  "credits_used": 12,
  "created_at": "2025-01-15T10:30:00Z",
  "completed_at": "2025-01-15T10:30:45Z"
}
Izinkomba ze-Webhook zikhona ukulayisha phezulu ngehora le-24 ngemuva kokuphela. Qinisekisa ukuthi uzulayisha phezulu ngokushesha.

Ukulungele ukuhlela?

Thola isithonjana sakho se-API bese uqala ukuxhuma i-TTS.ai kumasevisi akho.