Pranešti apie klaidą / funkcijų užklausą

Spark TTS TTS

Voice cloning from five seconds of audio combined with prompt-based control over emotion, speed, and speaking style.

0/500 simboliai · Užsiregistruoti 5000 vienai kartai →

Užsiregistruoti 5000 ženklų riba

BSML veiksena (Kalbų sintezė Markup Kalba puikiai kontrolei)

Apvynioti savo tekstą BSML žymės tiksliam valdymui:

<speak><prosody rate="slow">Slow speech</prosody></speak>

Emocijos / stiliaus žymės

Žymos pasirinktas modelis supranta — spustelėkite, kad įmestų vieną į tekstą, kur tai atsitinka:

Tarimo žodynas

Apibrėžti užsakymą tarimas (žodžio = tarimas):

Pikis 0

-12 +12

AI modelis

Balsas

Kalba

Išvesties formatas

Greitis 1.0x

0.5x 2.0x

Laisvas su piper, VITS, meloTTS

Čia bus rodomas Jūsų sugeneruotas garsas. Pasirinkite modelį, įveskite tekstą ir spustelėkite Generuoti.

Apie Spark TTS

Spark TTS by SparkAudio merges voice cloning with controllable delivery in a single prompt-driven system. Using just five seconds of reference audio it clones a voice, then lets you steer emotion, speed, and speaking style while keeping that cloned identity intact. Under the hood it combines a BiCodec audio tokenizer, an LLM, and flow matching, and it supports English and Chinese. It is aimed at content creation where a single cloned voice needs to express a range of moods and pacing. Note the licensing split: Spark's code is Apache 2.0, but the model weights are released under CC BY-NC-SA 4.0, which restricts commercial use.

Geriausia: Content creation with cloned voices and emotional control

Naršyti viską Spark TTS balsai

Iš pirmo žvilgsnio

Programuotojas: SparkAudio
Licencija: CC BY-NC-SA 4.0
Pakopa: standard
Greitis: medium
Balso klonavimas: Taip
Kalbos: English, Chinese
Daugiausia simbolių: 1000

Spark TTS balsai

Chinese Default

Chinese

Standartinis Neutral

Default

English

Standartinis Neutral

Spark TTS TTS – DUK

It uses a prompt-based control system layered on top of voice cloning, so you can adjust emotion, speed, and speaking style while preserving the identity of the cloned voice.

About five seconds of reference audio is enough to clone a voice in English or Chinese.

Its model weights are licensed CC BY-NC-SA 4.0, which prohibits commercial use, even though the project code is Apache 2.0. Choose a permissively-licensed model for commercial work.

← Visi balsai

Spark TTS TTS

Mėgstu TTS.ai? Papasakok draugams!

Apie Spark TTS

Iš pirmo žvilgsnio

Spark TTS balsai

Chinese Default

Default

Spark TTS TTS – DUK

How does Spark TTS control emotion and style?

How much audio does Spark TTS need to clone a voice?

Can I use Spark TTS commercially?