AI Lip Sync Video Generator

Upload a face photo and an audio clip — get a talking-head video with realistic lip sync, head pose, and blinks. Powered by SadTalker (MIT). Commercial use OK.

Mir maachen dat D'Stëmm vum Mënsch

Upload Face + Audio

1,000 characters per second

Ziehen a léisen Är Datei hei, oder Sich

JPG, PNG, or short MP4/WebM. Max 10MB. One clear, well-lit face works best.

file.mp3

0 MB

Ziehen a léisen Är Datei hei, oder Sich

MP3, WAV, M4A, or FLAC. Max 10MB. Free: up to 30 sec. Pro: up to 5 min.

file.mp3

0 MB

Veraarbechtung...

Rendering your video. This typically takes 30 seconds to 2 minutes.

Your Talking-Head Video

Erofgelueden

About SadTalker

SadTalker (CVPR 2023, Tencent ARC) is an open-source talking-head model that animates a single face image to speak any audio. Unlike Wav2Lip variants, SadTalker also animates head pose, blinks, and expression for a more natural result.

Code and weights are MIT-licensed end to end — no Llama, Gemma, or non-commercial backbone — so the videos you generate are safe for commercial use.

Tips for Best Results

  • Use a high-quality, well-lit portrait — eyes visible, mouth closed
  • Centered face, square or 4:5 aspect ratio works best
  • Clean speech audio (no music) yields tighter lip sync
  • Enable GFPGAN for hero shots — doubles render time but sharpens detail
  • Use the Still preset when you want a steady avatar shot

Lip Sync Video Plans

Gratis ufänken, aktualiséieren wann Dir méi braucht

Free
  • 30-second audio limit
  • 256 px output
  • "Still" preset only
  • No face enhancer
Déi populärst
Free Account
  • 30-second audio limit
  • Both "full" and "still" presets
  • 256 / 512 px output
  • GFPGAN face enhancer
Gratis anmelden
Pro
  • 5-minute audio limit
  • Priority GPU queue
  • API access (multipart upload)
  • Webhook completion callbacks
  • Commercial use (MIT license)
Aktualiséieren

Häufig gestallte Froen

Upload a face photo and an audio clip, and the AI generates a video of that face speaking the audio with realistic lip movements, head pose, and blinks. Built on SadTalker (CVPR 2023), an MIT-licensed talking-head model that animates expression in addition to mouth shape.

The face input can be a JPG or PNG image (up to 10 MB) or a short MP4/WebM driving video (we use the first frame). The driving audio can be MP3, WAV, M4A, or FLAC up to 10 MB. We resample audio to 16 kHz internally.

Free accounts: up to 30 seconds per clip. Paying users: up to 5 minutes per request. Longer audio means longer render time and higher character cost.

Lip sync video uses 1,000 characters per second of generated video. A 30-second clip = 30,000 characters. The cost is billed up front from your character balance and refunded automatically if generation fails.

Yes — SadTalker code and weights are MIT licensed end to end (no Llama, Gemma, or non-commercial backbone). The videos you generate are yours to use commercially. You are responsible for having the rights to the source face image and audio you upload.

About 30 seconds for a 5-second clip on our A100 server, scaling roughly linearly with audio length. Enabling the GFPGAN face enhancer roughly doubles render time but produces sharper, higher-quality output.

Full preset (default) animates head pose, blinks, and expression along with the lips, producing a more natural talking-head video. Still preset locks the head in place and animates only the mouth — useful when you want a steady avatar shot.

GFPGAN is a face restoration model that sharpens facial details after lip-sync rendering. It cleans up artifacts and makes 256-pixel output look closer to 512. It roughly doubles render time but is worth it for hero shots.

SadTalker renders at 256 px by default. Switch to 512 px size for sharper output (slower, higher VRAM) or enable the GFPGAN enhancer to upscale facial details. For best results, upload a high-quality, well-lit portrait photo.

Yes. Upload an MP4 or WebM as the face input and we will use the first frame as the driving identity. For full video re-dubbing (per-frame mouth replacement), see the upcoming Dubbing Studio video pipeline.

Yes. POST a multipart request to /api/v1/lipsync/ with face and audio fields, then poll /api/v1/lipsync/result/?uuid= until status is "completed". The response contains a URL to the rendered MP4. API access requires a paid plan.

SadTalker uses face-alignment to detect and crop the most prominent face. For best results, upload a portrait with one person centered, eyes visible, and minimal occlusion. Group photos may produce unpredictable results.
5.0/5 (1)

What could we improve? Your feedback helps us fix issues.

Wëllt Dir ufänken?

Eng Kreditkaart ass net néideg.