Sesame CSM

Speaker 0

Whakawhiwhinga English Neutral Sesame CSM

Speaker 0 is a neutral AI voice powered by the Sesame CSM text-to-speech model. This premium-tier voice speaks English and delivers studio-quality speech synthesis. With slower but high-fidelity generation speed and a quality rating of 5/5, Speaker 0 is well-suited for ai assistants, chatbots, conversational ai applications. The Sesame CSM engine is developed by Sesame under the Apache 2.0 license, making it safe for commercial use. Key capabilities include: conversational, natural timing, turn-taking, backchannel, 1b parameters.

No ratings yet

Sesame CSMWhakamāramatanga tauira

Kāhua Sesame CSM
kaiwhakawhanake Sesame
Whakahautanga
Āhuatanga He pōturi
Whakawhiwhinga Apache 2.0
Ko te tārua Kāore i te wātea
Te āhua Premium (4 ngā pūtea/1K ngā pūāhua)
Parameters 1B
Architecture Llama Backbone + Audio Codec
Year 2025

Ko ngā take whakamahi tino pai mō Speaker 0

Ko ngā taupānga i whakaritea i runga i tēnei reo

Audiobooks & Narration

Use Speaker 0 to narrate long-form content with natural prosody and expression.

Video Voiceovers

Add professional narration to YouTube videos, ads, and social media content.

Podcasts & Broadcasting

Studio-quality output suitable for podcasts, radio, and professional broadcasting.

Games & Interactive Media

Premium quality for game dialogue, interactive stories, and immersive experiences.

He nui ake Sesame CSM Pāpāho

Ko ētahi atu reo mai i te tauira TTS ōrite

Speaker 1

English Neutral

E pā ana ngā pātai

Sesame CSM (Conversational Speech Model) is a 1 billion parameter model designed specifically for generating conversational speech. It models the natural patterns of human conversation including turn-taking timing, backchannel responses, emotional reactions, and conversational flow. CSM generates audio that sounds like a natural human conversation rather than synthetic speech.

Sesame CSM was developed by Sesame and is released under the Apache 2.0 license, which permits commercial use of generated audio.

Sesame CSM supports 1 language: English.

Sesame CSM is in the Premium tier — 4 credits per 1,000 characters. You can preview any Sesame CSM voice for free before generating full audio.

Sesame CSM has slower (prioritizing quality) generation speed. It takes longer per generation but produces higher fidelity output.

Sesame CSM is rated 5/5 for audio quality on TTS.ai. It delivers studio-grade, human-like speech.

No, Sesame CSM uses a fixed set of built-in voices. For voice cloning, try models like CosyVoice 2, GPT-SoVITS, or Chatterbox.

Yes, Sesame CSM is specifically recommended for ai assistants, chatbots, conversational ai applications. Its conversational, natural timing, turn-taking capabilities make it an excellent choice for this use case.

Yes, Sesame CSM is licensed under Apache 2.0, which allows commercial use. Audio generated with Sesame CSM voices can be used in videos, podcasts, apps, games, and any other commercial project.

Yes, all voices on TTS.ai use commercially-licensed open-source models (MIT, Apache 2.0). The generated audio is yours to use in videos, podcasts, apps, games, and any other commercial application.

Send a POST request to /api/v1/tts/ with the model name and voice ID. See our API Documentation page for code examples in Python, JavaScript, Go, and cURL.

Yes, click the play button on this page to hear a sample. You can also type custom text on the Text to Speech page and generate a free preview with any voice.

Whakamātautau Speaker 0 Ināianei

Type i tētahi kupu me te mōhio ki a ia e kōrero ana Speaker 0. Waihoki ki te whakamahi.