รายงานข้อผิดพลาด / ขอฟีเจอร์ใหม่

CosyVoice 2 เสียง

Alibaba Tongyi Lab's streaming TTS reaching human-parity naturalness with near-zero latency and zero-shot cloning.

0/500 ตัวอักษร · ลงทะเบียน 5,000 คนต่อรุ่น →

ลงทะเบียน สำหรับจำกัดตัวอักษร 5,000 ตัว

โหมด SSML (ภาษาการทำเครื่องหมายสังเคราะห์เสียงสำหรับควบคุมการพูดName)

หมุนข้อความของคุณในแท็ก SSML เพื่อควบคุมอย่างแม่นยำ:

<speak><prosody rate="slow">Slow speech</prosody></speak>

แท็กอารมณ์/ รูปแบบ

แท็กที่โมเดลที่เลือกไว้เข้าใจ - คลิกเพื่อวางแท็กลงในข้อความของคุณที่มันเกิดขึ้น:

คำอธิบายการออกเสียงName

ตั้งค่าการออกเสียงที่กำหนดเอง (คำ = การออกเสียง):

ระดับเสียง 0

-12 +12

โมเดล AI

เสียง

ภาษา

รูปแบบผลลัพธ์

ความเร็ว 1.0x

0.5x 2.0x

ใช้ฟรีกับไพเปอร์, VITS, MeloTTS

เสียงที่สร้างขึ้นจะปรากฏที่นี่ เลือกโมเดล พิมพ์ข้อความ และคลิกที่ สร้าง

เกี่ยวกับ CosyVoice 2

CosyVoice 2, from Alibaba's Tongyi Lab, was designed to make high-quality speech viable in real time. It uses a finite scalar quantization approach combined with flow matching to support streaming synthesis at extremely low latency, while reaching human-comparable naturalness that outperforms many commercial systems in subjective tests. Beyond quality, it offers zero-shot voice cloning from about 3 seconds of audio, cross-lingual synthesis, and fine-grained emotion control. Covering 8 languages with a 1,000-character cap, it's a strong fit for voice assistants, streaming TTS, and other real-time applications.

เหมาะสำหรับ: Real-time applications, streaming TTS, voice assistants

แสดงทั้งหมด CosyVoice 2 เสียง

เพียงแค่มองดู

ผู้พัฒนา: Alibaba (Tongyi Lab)
ใบอนุญาต: Apache 2.0
สัตว์: standard
ความเร็ว: medium
เสียง: ใช่
ภาษา: English, Chinese, Japanese, Korean, French, German, Italian, Spanish
จำนวนตัวอักษรสูงสุด: 1000