API Documentation

Integrate TTS.ai into your applications with our REST API. OpenAI-compatible format for easy migration.

REST API OpenAI Compatible JSON Responses Streaming Support

Overview

The TTS.ai API provides programmatic access to all platform features: text-to-speech synthesis, speech-to-text transcription, voice cloning, audio enhancement, and more. The API uses standard REST conventions with JSON request/response bodies.

API Key

Get your API key from Account Settings. Available on Pro and Enterprise plans.

Base URL

https://api.tts.ai/v1/

Auth

Bearer token via Authorization header

Authentication

All API requests require authentication via a Bearer token in the Authorization header.

HTTP Header
Authorization: Bearer sk-tts-your-api-key-here
Keep your API key secret. Do not share it in client-side code, public repositories, or logs. Rotate keys regularly from your account settings.

Base URL

Base URL: https://api.tts.ai/v1/

All endpoints are relative to this base URL. For example, the TTS endpoint is:

POST https://api.tts.ai/v1/tts/

Rate Limits

API rate limits vary by plan:

Plan Requests/min Concurrent Max Text Length
Pro 60 5 5,000 chars
Enterprise 300 20 50,000 chars

Rate limit headers are included in every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.

Credit Costs

Service Cost Unit
TTS (Free models: Piper, VITS, MeloTTS) 1 credit per 1,000 characters
TTS (Standard models: Kokoro, CosyVoice 2, etc.) 2 credits per 1,000 characters
TTS (Premium models: Tortoise, Chatterbox, etc.) 4 credits per 1,000 characters
Speech to Text 2 credits per minute of audio
Voice Cloning 4 credits per 1,000 characters
Voice Changer 3 credits per minute of audio
Audio Enhancement 2 credits per minute of audio
Vocal Removal / Stem Splitting 3-4 credits per minute of audio
Speech Translation 5 credits per minute of audio
Voice Chat 3 credits per turn
Key & BPM Finder Free --
Audio Converter Free --

Text to Speech

POST /v1/tts/

Convert text to speech audio. Returns audio file in the requested format.

Request Body

ParameterTypeRequiredDescription
model string Yes Model ID (e.g., <code>kokoro</code>, <code>chatterbox</code>, <code>piper</code>)
text string Yes Text to convert to speech (max 5,000 chars for Pro, 50,000 for Enterprise)
voice string Yes Voice ID (use <code>/v1/voices/</code> to list available voices)
format string No Output format: <code>mp3</code> (default), <code>wav</code>, <code>flac</code>, <code>ogg</code>
speed float No Speaking speed multiplier. Default: <code>1.0</code>. Range: <code>0.5</code> to <code>2.0</code>
language string No Language code (e.g., <code>en</code>, <code>es</code>). Auto-detected if omitted.
stream boolean No Enable streaming response. Default: <code>false</code>

Example Request

cURL
curl -X POST https://api.tts.ai/v1/tts/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "text": "Hello from TTS.ai! This is a test.",
    "voice": "af_bella",
    "format": "mp3"
  }' \
  --output output.mp3

Response

Returns the audio file as binary data with appropriate <code>Content-Type</code> header (<code>audio/mpeg</code>, <code>audio/wav</code>, etc.).

Response Headers
Content-Type: audio/mpeg
Content-Length: 48256
X-Credits-Used: 2
X-Credits-Remaining: 498

Speech to Text

POST /v1/stt/

Transcribe audio to text. Supports 99 languages with auto-detection.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
file file Yes Audio file (MP3, WAV, FLAC, OGG, M4A, MP4, WebM). Max 100MB.
model string No STT model: <code>whisper</code> (default), <code>faster-whisper</code>, <code>sensevoice</code>
language string No Language code. <code>auto</code> for auto-detection (default).
timestamps boolean No Include word-level timestamps. Default: <code>false</code>
diarize boolean No Enable speaker diarization. Default: <code>false</code>

Response

JSON Response
{
  "text": "Hello, this is a transcription test.",
  "language": "en",
  "duration": 3.5,
  "segments": [
    {
      "start": 0.0,
      "end": 1.8,
      "text": "Hello, this is",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 1.8,
      "end": 3.5,
      "text": "a transcription test.",
      "speaker": "SPEAKER_00"
    }
  ]
}

Voice Cloning

POST /v1/tts/clone/

Generate speech in a cloned voice. Upload a reference audio and text.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
reference_audio file Yes Reference voice audio (10-30 seconds recommended). Max 20MB.
text string Yes Text to speak in the cloned voice.
model string No Clone model: <code>chatterbox</code> (default), <code>cosyvoice2</code>, <code>gpt-sovits</code>
format string No Output format: <code>mp3</code> (default), <code>wav</code>, <code>flac</code>
language string No Target language code. Must be supported by the chosen model.

Response

Returns the audio file as binary data, same as the TTS endpoint.

Voice Changer

POST /v1/voice-convert/

Convert audio to sound like a different voice. Upload source audio and choose a target voice.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
file file Yes Source audio file (MP3, WAV, FLAC). Max 50MB.
target_voice string Yes Target voice ID to convert to (use <code>/v1/voices/</code> to list available voices)
model string No Voice conversion model: <code>openvoice</code> (default), <code>knn-vc</code>
format string No Output format: <code>wav</code> (default), <code>mp3</code>, <code>flac</code>

Example Request

cURL
curl -X POST https://api.tts.ai/v1/voice-convert/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@source_audio.mp3" \
  -F "target_voice=af_bella" \
  -F "model=openvoice" \
  -o converted.wav

Response

Returns the converted audio file as binary data.

Speech Translation

POST /v1/speech-translate/

Translate spoken audio from one language to another. Combines speech-to-text, translation, and text-to-speech in a single call.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
file file Yes Source audio file in the original language. Max 100MB.
target_language string Yes Target language code (e.g., <code>es</code>, <code>fr</code>, <code>de</code>, <code>ja</code>)
voice string No Voice for translated output. Auto-selected if omitted.
preserve_voice boolean No Attempt to preserve the original speaker

Response

JSON Response
{
  "original_text": "Hello, how are you?",
  "translated_text": "Hola, como estas?",
  "source_language": "en",
  "target_language": "es",
  "audio_url": "https://api.tts.ai/v1/results/translate_abc123.mp3",
  "credits_used": 5
}

Speech to Speech

POST /v1/speech-to-speech/

Transform speech style, emotion, or delivery while keeping the content. Useful for adjusting tone, pacing, and expressiveness.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
file file Yes Source speech audio file. Max 50MB.
voice string Yes Target voice ID for the output speech
model string No Model: <code>openvoice</code> (default), <code>chatterbox</code>
emotion string No Target emotion: <code>neutral</code>, <code>happy</code>, <code>sad</code>, <code>angry</code>, <code>excited</code>
speed float No Speed adjustment. Default: <code>1.0</code>. Range: <code>0.5</code> to <code>2.0</code>

Response

Returns the transformed audio file as binary data.

Audio Tools

Audio processing endpoints for enhancement, vocal removal, stem splitting, and more.

POST /v1/audio/enhance/

Enhance audio quality: denoise, improve clarity, super resolution.

file fileAudio file to enhance
denoise booleanEnable denoising (default: true)
enhance_clarity booleanEnhance speech clarity (default: true)
super_resolution booleanUpscale audio quality (default: false)
strength integer1-3 (light, medium, strong). Default: 2
POST /v1/audio/separate/

Separate vocals from instrumentals (vocal removal) or split into stems.

file fileAudio file to separate
model stringdemucs (default) or spleeter
stems integerNumber of stems: 2, 4, 5, or 6 (default: 2)
format stringOutput format: <code>wav</code>, <code>mp3</code>, <code>flac</code>
POST /v1/audio/dereverb/

Remove echo and reverb from audio recordings.

file fileAudio file to process
type stringecho or reverb (default: both)
intensity integer1-5 (default: 3)
POST /v1/audio/analyze/ Free

Analyze audio to detect key, BPM, and time signature.

file fileAudio file to analyze
Response
{
  "key": "C",
  "scale": "Major",
  "bpm": 120.0,
  "time_signature": "4/4",
  "camelot": "8B",
  "compatible_keys": ["C Major", "G Major", "F Major", "A Minor"]
}
POST /v1/audio/convert/ Free

Convert audio between formats.

file fileAudio file to convert
format stringTarget format: <code>mp3</code>, <code>wav</code>, <code>flac</code>, <code>ogg</code>, <code>m4a</code>, <code>aac</code>
bitrate integerOutput bitrate in kbps: 64, 128, 192, 256, 320
sample_rate integerSample rate: 22050, 44100, 48000
channels stringmono or stereo

Voice Chat

POST /v1/voice-chat/

Send audio or text and receive an AI response with synthesized speech.

Request Body (multipart/form-data or JSON)

ParameterTypeRequiredDescription
audio file No* Audio input (either <code>audio</code> or <code>text</code> required)
text string No* Text input (either <code>audio</code> or <code>text</code> required)
voice string No Voice for AI response. Default: <code>af_bella</code>
tts_model string No TTS model for response. Default: <code>kokoro</code>
system_prompt string No Custom system prompt for the AI
conversation_id string No Continue an existing conversation

Response

JSON Response
{
  "conversation_id": "conv_abc123",
  "user_text": "What is the capital of France?",
  "ai_text": "The capital of France is Paris.",
  "audio_url": "https://api.tts.ai/v1/audio/tmp/resp_xyz.mp3",
  "credits_used": 3
}

List Models

GET /v1/models/

Returns a list of all available models with their capabilities.

Response

JSON Response
{
  "models": [
    {
      "id": "kokoro",
      "name": "Kokoro",
      "type": "tts",
      "tier": "standard",
      "languages": ["en", "ja", "ko", "zh", "fr"],
      "supports_cloning": false,
      "supports_streaming": true,
      "credits_per_1k_chars": 2
    },
    {
      "id": "chatterbox",
      "name": "Chatterbox",
      "type": "tts",
      "tier": "premium",
      "languages": ["en"],
      "supports_cloning": true,
      "supports_streaming": true,
      "credits_per_1k_chars": 4
    }
  ]
}

List Voices

GET /v1/voices/

Returns a list of all available voices, optionally filtered by model or language.

Query Parameters

ParameterTypeDescription
model string Filter by model ID (e.g., <code>kokoro</code>)
language string Filter by language code (e.g., <code>en</code>)
gender string Filter by gender: <code>male</code>, <code>female</code>, <code>neutral</code>

Response

JSON Response
{
  "voices": [
    {
      "id": "af_bella",
      "name": "Bella",
      "model": "kokoro",
      "language": "en",
      "gender": "female",
      "preview_url": "https://api.tts.ai/v1/voices/preview/af_bella.mp3"
    }
  ],
  "total": 142
}

Code Examples

Text to Speech

Python - requests
import requests

API_KEY = "sk-tts-your-key"

# Text to Speech
response = requests.post(
    "https://api.tts.ai/v1/tts/",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "kokoro",
        "text": "Hello from TTS.ai!",
        "voice": "af_bella",
        "format": "mp3"
    }
)

with open("output.mp3", "wb") as f:
    f.write(response.content)

print(f"Credits used: {response.headers.get('X-Credits-Used')}")

Speech to Text

Python - requests
# Speech to Text
with open("recording.mp3", "rb") as f:
    response = requests.post(
        "https://api.tts.ai/v1/stt/",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": f},
        data={"model": "faster-whisper", "timestamps": "true"}
    )

result = response.json()
print(result["text"])

Voice Cloning

Python - requests
# Voice Cloning
with open("reference.wav", "rb") as ref:
    response = requests.post(
        "https://api.tts.ai/v1/tts/clone/",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"reference_audio": ref},
        data={
            "text": "This speech uses a cloned voice.",
            "model": "chatterbox"
        }
    )

with open("cloned_output.mp3", "wb") as f:
    f.write(response.content)

Text to Speech

JavaScript - fetch
const API_KEY = 'sk-tts-your-key';

// Text to Speech
const response = await fetch('https://api.tts.ai/v1/tts/', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'kokoro',
    text: 'Hello from TTS.ai!',
    voice: 'af_bella',
    format: 'mp3'
  })
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Speech to Text

JavaScript - fetch
// Speech to Text
const formData = new FormData();
formData.append('file', audioFile);
formData.append('model', 'faster-whisper');

const response = await fetch('https://api.tts.ai/v1/stt/', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${API_KEY}` },
  body: formData
});

const result = await response.json();
console.log(result.text);

Text to Speech

cURL
# Text to Speech
curl -X POST https://api.tts.ai/v1/tts/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"kokoro","text":"Hello!","voice":"af_bella","format":"mp3"}' \
  -o output.mp3

Speech to Text

cURL
# Speech to Text
curl -X POST https://api.tts.ai/v1/stt/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@recording.mp3" \
  -F "model=faster-whisper" \
  -F "timestamps=true"

Voice Cloning

cURL
# Voice Cloning
curl -X POST https://api.tts.ai/v1/tts/clone/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "reference_audio=@reference.wav" \
  -F "text=This uses a cloned voice." \
  -F "model=chatterbox" \
  -o cloned.mp3

Audio Enhancement

cURL
# Audio Enhancement
curl -X POST https://api.tts.ai/v1/audio/enhance/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@noisy_audio.mp3" \
  -F "denoise=true" \
  -F "enhance_clarity=true" \
  -o enhanced.mp3

Error Codes

All errors return a JSON response with an error field.

Error Response Format
{
  "error": {
    "code": "insufficient_credits",
    "message": "You do not have enough credits for this request.",
    "credits_required": 4,
    "credits_available": 2
  }
}
HTTP StatusError CodeDescription
400 bad_request Invalid request parameters. Check the error message for details.
401 unauthorized Missing or invalid API key.
402 insufficient_credits Not enough credits. Purchase more at /pricing/.
403 forbidden API access not available on your plan.
404 not_found Model or voice not found.
413 file_too_large Uploaded file exceeds the size limit.
429 rate_limited Too many requests. Check rate limit headers.
500 internal_error Server error. Try again later.
503 model_loading Model is loading. Retry in a few seconds.

Webhooks

For long-running tasks (stem splitting, batch TTS), you can provide a <code>webhook_url</code> parameter. When the task completes, we will POST the result to your URL.

Webhook Payload
{
  "event": "task.completed",
  "task_id": "task_abc123",
  "status": "success",
  "result_url": "https://api.tts.ai/v1/results/task_abc123",
  "credits_used": 12,
  "created_at": "2025-01-15T10:30:00Z",
  "completed_at": "2025-01-15T10:30:45Z"
}
Webhook results are available for download for 24 hours after completion. Make sure to download them promptly.

Ready to Build?

Get your API key and start integrating TTS.ai into your applications.