Text to Speech API for Developers

Build voice-enabled applications with our REST API. Add natural text-to-speech, voice cloning, speech-to-text, and audio processing to your apps, chatbots, voice assistants, and SaaS products. OpenAI-compatible format, 24+ models, simple integration.

REST API Chatbots Voice Apps SaaS Products Automation

Try It Now

0/500
Free with Kokoro, Piper, VITS, MeloTTS
Umsindo wakho okhiqizwe uzovela lapha
Generated
0:00 0:00
Uthanda i-TTS.ai? Xhumana nabangane bakho!

API Features for Developers

Everything you need to build voice-enabled applications

Simple REST API

One POST request to generate speech. JSON request, audio response. Works with any programming language that supports HTTP.

OpenAI-Compatible

Drop-in replacement for OpenAI TTS API. Switch your base_url and API key — existing code works immediately.

24+ Models Available

Access every model through a single API. Switch models by changing one parameter. Compare quality, speed, and cost.

Sub-Second Latency

Kokoro generates audio in under 1 second. Perfect for real-time chatbots, voice assistants, and interactive applications.

Voice Cloning API

Clone any voice from a short audio sample via the API. Use cloned voices for all subsequent generations.

Multiple Formats

Output as WAV, MP3, OGG, or FLAC. Choose sample rate and bit depth. Streaming audio support for real-time apps.

Best Models for Developer Integration

Choose the right model for your application's speed, quality, and cost requirements

KokoroKokoro

Free

Lightweight 82M parameter model delivering studio-quality speech with blazing-fast inference.

Fast 5/5

Engcono kakhulu: Fastest model — sub-second latency, ideal for real-time apps and chatbots

Zama Kokoro

CosyVoice 2CosyVoice 2

Standard

Alibaba's scalable streaming TTS with human-parity naturalness and near-zero latency.

Medium 5/5 Ukulungiswa kwezwi

Engcono kakhulu: Streaming TTS with voice cloning for voice assistant applications

Zama CosyVoice 2

Sesame CSMSesame CSM

Premium

Conversational speech model generating natural dialogue with appropriate timing and emotion.

Slow 5/5

Engcono kakhulu: Conversational AI with natural timing for chatbot and assistant voice

Zama Sesame CSM

PiperPiper

Free

A fast, local neural text to speech system optimized for Raspberry Pi and embedded devices.

Fast 3/5

Engcono kakhulu: Free, CPU-only model for high-volume applications with zero credit cost

Zama Piper

BarkBark

Standard

Transformer-based text-to-audio model that generates realistic speech, music, and sound effects.

Slow 4/5

Engcono kakhulu: Audio generation with sound effects for creative and entertainment apps

Zama Bark

How to Integrate the TTS API

From signup to first API call in under 5 minutes

1

Get Your API Key

Sign up for free and generate an API key from your account dashboard. 50 credits included.

2

Make Your First Call

POST to /v1/tts with text, model, and voice. Get audio bytes back. Under 5 lines of code.

3

Choose Your Model

Test different models for your use case. Compare speed, quality, and cost per generation.

4

Ship to Production

Scale with pay-as-you-go credits. No rate limits on paid plans. Monitor usage in your dashboard.

Quick Start Code Examples

Integrate TTS.ai in any language with our REST API

Python Okuthandwayo
import requests

response = requests.post(
    "https://api.tts.ai/v1/tts",
    json={
        "text": "Hello from my app!",
        "model": "kokoro",
        "voice": "af_heart",
        "format": "mp3"
    },
    headers={
        "Authorization": "Bearer sk-tts-xxx"
    }
)

with open("output.mp3", "wb") as f:
    f.write(response.content)
JavaScript (Node.js) Node.js
const response = await fetch(
    "https://api.tts.ai/v1/tts",
    {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            "Authorization": "Bearer sk-tts-xxx"
        },
        body: JSON.stringify({
            text: "Hello from my app!",
            model: "kokoro",
            voice: "af_heart",
            format: "mp3"
        })
    }
);

const audio = await response.blob();
cURL Universal
curl -X POST https://api.tts.ai/v1/tts \
  -H "Authorization: Bearer sk-tts-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from my app!",
    "model": "kokoro",
    "voice": "af_heart",
    "format": "mp3"
  }' \
  --output output.mp3
OpenAI-Compatible Format Drop-in
# Works with OpenAI client library
from openai import OpenAI

client = OpenAI(
    api_key="sk-tts-xxx",
    base_url="https://api.tts.ai/v1"
)

response = client.audio.speech.create(
    model="kokoro",
    voice="af_heart",
    input="Hello from my app!"
)

response.stream_to_file("output.mp3")

What Developers Build with TTS.ai

Common integration patterns and applications

AI Chatbots & Assistants

Add voice output to your chatbot or AI assistant. Pipe LLM responses through TTS for voice-enabled interfaces. Kokoro delivers sub-second latency for real-time conversations. Sesame CSM generates conversational speech with natural timing.

  • LLM response to speech pipeline
  • Isikhathi esingaphansi kwesekunxele se-latency nge-Kokoro
  • Conversational speech with Sesame CSM
  • Streaming audio output

Mobile & Voice Apps

Build voice-enabled mobile apps, accessibility tools, reading apps, and language learning platforms. Our REST API works with any mobile framework. Download audio files or stream directly to the client.

  • React Native, Flutter, Swift, Kotlin
  • Accessibility and reading apps
  • Language learning platforms
  • Audio content generation

SaaS Products

White-label voice capabilities in your SaaS product. Add TTS, STT, voice cloning, and audio processing as features in your platform. Use our API as your voice backend without managing GPU infrastructure.

  • White-label voice features
  • No GPU infrastructure needed
  • Pay-per-use pricing
  • 24+ models to offer your users

Automation Pipelines

Integrate voice generation into CI/CD pipelines, content automation, and batch processing workflows. Generate thousands of audio files from spreadsheet data, automate podcast production, or build content localization pipelines.

  • Batch processing via API
  • Content localization pipelines
  • CI/CD integration
  • Spreadsheet to audio automation

API Specifications

Built for production applications

24+

TTS Models

100+

Voices

30+

Languages

<1s

Latency (Kokoro)

Imibuzo ebuzwa kaningi

Common questions about the TTS.ai developer API

Yes. Our API follows the OpenAI audio speech format. If you are using the OpenAI Python or JavaScript client library, you can switch to TTS.ai by changing the base_url and api_key parameters. Your existing code works without modification.

Kokoro generates audio in under 1 second for typical sentences. CosyVoice 2 supports streaming output for even lower perceived latency. For chatbots and voice assistants, total round-trip time is typically 1-3 seconds depending on text length and model choice.

Free models (Kokoro, Piper, VITS, MeloTTS) cost zero credits. Standard models cost 2 credits per 1,000 characters. Premium models cost 4 credits per 1,000 characters. Sign up free with 50 credits. Plans start at $9/month for 500 credits.

Yes. Upload a reference audio sample (5-30 seconds) to the voice cloning endpoint, then use the cloned voice ID in subsequent TTS requests. Models that support cloning include CosyVoice 2, Chatterbox, Fish Speech, and GPT-SoVITS.

Free tier has basic rate limiting (3 requests per hour without an account). Paid plans have generous rate limits suitable for production applications. Contact us for enterprise-level throughput requirements.

WAV (uncompressed, highest quality), MP3 (compressed, smaller files), OGG (open format), and FLAC (lossless compression). Specify the format in your request. Default is WAV at the model's native sample rate.

Yes. Combine our TTS API with a speech-to-text model and an LLM to build a complete voice assistant pipeline. Kokoro provides sub-second latency ideal for real-time conversation. CosyVoice 2 supports streaming output for even lower perceived response times.

CosyVoice 2 and Kokoro support streaming audio output where audio chunks are delivered as they are generated. This reduces time-to-first-byte for real-time applications like voice assistants and interactive experiences.

The API returns standard HTTP status codes. Implement exponential backoff for 5xx errors and rate limit responses. For mission-critical applications, add a queue with retry logic. Our API has high uptime but resilient error handling is always recommended.

Yes. The /v1/voices and /v1/models endpoints return JSON lists of all available voices and models with their metadata (language support, quality ratings, speed ratings, and pricing tier). Use these to build dynamic model selectors in your application.

Free models (Kokoro, Piper, VITS, MeloTTS) serve as an effective sandbox since they cost zero credits. Test your integration with free models, then switch to premium models in production by changing the model parameter. No separate test environment is needed.

Most of our models are open-source and can be self-hosted. However, self-hosting requires significant GPU resources (we use 4x NVIDIA Tesla P40 with 96GB VRAM total). The API provides a cost-effective alternative without infrastructure management.
5.0/5 (1)

Ready to Build with Voice AI?

Get your free API key and start building. 50 credits on signup, free models available, comprehensive documentation.