Informa d' errors / Petició de característiques

Sesame CSM TTS

A 1B conversational speech model that captures natural dialogue timing, turn-taking, and backchannel responses.

Text
Fitxers

0/500 caràcters · Signa els 5.000 per generació →

Signa per 5000 caràcters límit

Mode SSML (Idioma de la marca de veu per a un bon control)

Ajusta el text a les etiquetes SSML per al control precís:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emoció / Etiquetes d' estil

Etiquetes del model seleccionat entenen el clic show clic per a deixar- ne un al text a on succeeix:

Diccionari de pronunciació

Defineix pronúncies personalitzades (word = pronunciació):

To 0

-12 +12

Model IA

Veu

Idioma

Format de sortida

Velocitat 1.0x

0.5x 2.0x

Lliure amb Pipista, VITS, MeloTTS

Aquí apareixerà el vostre àudio generat. Escolliu un model, introduïu text i cliqueu Genera.

Quant a Sesame CSM

Sesame CSM (Conversational Speech Model) is a 1-billion-parameter model from Sesame designed specifically for the rhythms of human conversation. Built on a Llama backbone paired with an audio codec, it models turn-taking timing, backchannel responses (the small acknowledgements people make while listening), emotional reactions, and overall conversational flow. The result reads less like read-aloud text and more like a real spoken exchange. It is a natural fit for AI assistants, chatbots, and conversational interfaces where the goal is speech that feels responsive and human. CSM is released under Apache 2.0, and access on TTS.ai requires a Hugging Face token at the model level.

Millor per: AI assistants, chatbots, conversational AI applications

Navega- ho tot Sesame CSM veus

En una mirada

Desenvolupador: Sesame
Llicència: Apache 2.0
TierCity name (optional, probably does not need a translation): premium
Velocitat: slow
clonació de veu: No
Idiomes: English
Nombre màxim de caràcters: 500

Sesame CSM veus

Speaker 0

English

Premium Neutral

Speaker 1

English

Premium Neutral

Sesame CSM PMF TTS

Conversational speech. It models the natural patterns of dialogue — turn-taking timing, backchannel responses, and emotional reactions — so generated audio sounds like a real conversation rather than synthetic narration.

It is a 1-billion-parameter model built on a Llama backbone with an audio codec for waveform generation.

AI assistants, chatbots, and other conversational applications where responsive, human-sounding speech matters more than long-form narration.

← Totes les veus

Sesame CSM TTS

Els teus amics!

Quant a Sesame CSM

En una mirada

Sesame CSM veus

Speaker 0

Speaker 1

Sesame CSM PMF TTS

What is Sesame CSM optimized for?

How large is the Sesame CSM model?

What is Sesame CSM best used for?