AI Lip Sync Video Generator

Upload a face photo and an audio clip — get a talking-head video with realistic lip sync, head pose, and blinks. Powered by SadTalker (MIT). Commercial use OK.

هنوز صداهای TTS برای زبان شما نداریم. به ما کمک کنید تا صداهای خودتان را اضافه کنیم! فروش صدای خود

Upload Face + Audio

1,000 characters per second

پروندۀ خود را به اینجا بکشید و بگذارید ، یا مرور

JPG, PNG, or short MP4/WebM. Max 10MB. One clear, well-lit face works best.

پرونده.mp3

0 MB

پروندۀ خود را به اینجا بکشید و بگذارید ، یا مرور

MP3, WAV, M4A, or FLAC. Max 10MB. Free: up to 30 sec. Pro: up to 5 min.

پرونده.mp3

0 MB

در حال پردازش...

Rendering your video. This typically takes 30 seconds to 2 minutes.

Your Talking-Head Video

About SadTalker

SadTalker (CVPR 2023, Tencent ARC) is an open-source talking-head model that animates a single face image to speak any audio. Unlike Wav2Lip variants, SadTalker also animates head pose, blinks, and expression for a more natural result.

Code and weights are MIT-licensed end to end — no Llama, Gemma, or non-commercial backbone — so the videos you generate are safe for commercial use.

Tips for Best Results

  • Use a high-quality, well-lit portrait — eyes visible, mouth closed
  • Centered face, square or 4:5 aspect ratio works best
  • Clean speech audio (no music) yields tighter lip sync
  • Enable GFPGAN for hero shots — doubles render time but sharpens detail
  • Use the Still preset when you want a steady avatar shot

Lip Sync Video Plans

شروع مجانی، ارتقاء وقتی که بیشتر نیاز دارید

Free
  • 30-second audio limit
  • 256 px output
  • "Still" preset only
  • No face enhancer
محبوب‌ترین
Free Account
  • 30-second audio limit
  • Both "full" and "still" presets
  • 256 / 512 px output
  • GFPGAN face enhancer
ثبت نام
Pro
  • 5-minute audio limit
  • Priority GPU queue
  • API access (multipart upload)
  • Webhook completion callbacks
  • Commercial use (MIT license)
ارتقا

پرسشهای متداول

Upload a face photo and an audio clip, and the AI generates a video of that face speaking the audio with realistic lip movements, head pose, and blinks. Built on SadTalker (CVPR 2023), an MIT-licensed talking-head model that animates expression in addition to mouth shape.

The face input can be a JPG or PNG image (up to 10 MB) or a short MP4/WebM driving video (we use the first frame). The driving audio can be MP3, WAV, M4A, or FLAC up to 10 MB. We resample audio to 16 kHz internally.

Free accounts: up to 30 seconds per clip. Paying users: up to 5 minutes per request. Longer audio means longer render time and higher character cost.

Lip sync video uses 1,000 characters per second of generated video. A 30-second clip = 30,000 characters. The cost is billed up front from your character balance and refunded automatically if generation fails.

Yes — SadTalker code and weights are MIT licensed end to end (no Llama, Gemma, or non-commercial backbone). The videos you generate are yours to use commercially. You are responsible for having the rights to the source face image and audio you upload.

About 30 seconds for a 5-second clip on our A100 server, scaling roughly linearly with audio length. Enabling the GFPGAN face enhancer roughly doubles render time but produces sharper, higher-quality output.

Full preset (default) animates head pose, blinks, and expression along with the lips, producing a more natural talking-head video. Still preset locks the head in place and animates only the mouth — useful when you want a steady avatar shot.

GFPGAN is a face restoration model that sharpens facial details after lip-sync rendering. It cleans up artifacts and makes 256-pixel output look closer to 512. It roughly doubles render time but is worth it for hero shots.

SadTalker renders at 256 px by default. Switch to 512 px size for sharper output (slower, higher VRAM) or enable the GFPGAN enhancer to upscale facial details. For best results, upload a high-quality, well-lit portrait photo.

Yes. Upload an MP4 or WebM as the face input and we will use the first frame as the driving identity. For full video re-dubbing (per-frame mouth replacement), see the upcoming Dubbing Studio video pipeline.

Yes. POST a multipart request to /api/v1/lipsync/ with face and audio fields, then poll /api/v1/lipsync/result/?uuid= until status is "completed". The response contains a URL to the rendered MP4. API access requires a paid plan.

SadTalker uses face-alignment to detect and crop the most prominent face. For best results, upload a portrait with one person centered, eyes visible, and minimal occlusion. Group photos may produce unpredictable results.
5.0/5 (1)

چه چیزی میتونیم بهتر کنیم؟ بازخورد شما به ما کمک میکنه مشکلات رو حل کنیم.

آماده اي که شروع کني؟

ثبت نام مجاني کن و 50 کرید بگیر کارت اعتباری لازم نیست