Report Bug / Feature Request

AI Lip Sync Video Generator

Upload a face photo and an audio clip — get a talking-head video with realistic lip sync, head pose, and blinks. Powered by SadTalker (MIT). Commercial use OK.

Gratis anmelden

Mir maachen dat D'Stëmm vum Mënsch

Upload Face + Audio

1,000 characters per second

1. Face Image or Driving Video

Ziehen a léisen Är Datei hei, oder Sich

JPG, PNG, or short MP4/WebM. Max 10MB. One clear, well-lit face works best.

2. Driving Audio

Ziehen a léisen Är Datei hei, oder Sich

MP3, WAV, M4A, or FLAC. Max 10MB. Free: up to 30 sec. Pro: up to 5 min.

Animation Preset

Output Size

Face Enhancer

GFPGAN (sharper, slower)

About SadTalker

SadTalker (CVPR 2023, Tencent ARC) is an open-source talking-head model that animates a single face image to speak any audio. Unlike Wav2Lip variants, SadTalker also animates head pose, blinks, and expression for a more natural result.

Code and weights are MIT-licensed end to end — no Llama, Gemma, or non-commercial backbone — so the videos you generate are safe for commercial use.

Tips for Best Results

Use a high-quality, well-lit portrait — eyes visible, mouth closed
Centered face, square or 4:5 aspect ratio works best
Clean speech audio (no music) yields tighter lip sync
Enable GFPGAN for hero shots — doubles render time but sharpens detail
Use the Still preset when you want a steady avatar shot

Lip Sync Video Plans

Gratis ufänken, aktualiséieren wann Dir méi braucht

Free

30-second audio limit
256 px output
"Still" preset only
No face enhancer

Déi populärst

Free Account

30-second audio limit
Both "full" and "still" presets
256 / 512 px output
GFPGAN face enhancer

Gratis anmelden

Pro

5-minute audio limit
Priority GPU queue
API access (multipart upload)
Webhook completion callbacks
Commercial use (MIT license)

Aktualiséieren

Häufig gestallte Froen

Upload a face photo and an audio clip, and the AI generates a video of that face speaking the audio with realistic lip movements, head pose, and blinks. Built on SadTalker (CVPR 2023), an MIT-licensed talking-head model that animates expression in addition to mouth shape.

The face input can be a JPG or PNG image (up to 10 MB) or a short MP4/WebM driving video (we use the first frame). The driving audio can be MP3, WAV, M4A, or FLAC up to 10 MB. We resample audio to 16 kHz internally.

Free accounts: up to 30 seconds per clip. Paying users: up to 5 minutes per request. Longer audio means longer render time and higher character cost.

Lip sync video uses 1,000 characters per second of generated video. A 30-second clip = 30,000 characters. The cost is billed up front from your character balance and refunded automatically if generation fails.

Yes — SadTalker code and weights are MIT licensed end to end (no Llama, Gemma, or non-commercial backbone). The videos you generate are yours to use commercially. You are responsible for having the rights to the source face image and audio you upload.

About 30 seconds for a 5-second clip on our A100 server, scaling roughly linearly with audio length. Enabling the GFPGAN face enhancer roughly doubles render time but produces sharper, higher-quality output.

Full preset (default) animates head pose, blinks, and expression along with the lips, producing a more natural talking-head video. Still preset locks the head in place and animates only the mouth — useful when you want a steady avatar shot.

GFPGAN is a face restoration model that sharpens facial details after lip-sync rendering. It cleans up artifacts and makes 256-pixel output look closer to 512. It roughly doubles render time but is worth it for hero shots.

SadTalker renders at 256 px by default. Switch to 512 px size for sharper output (slower, higher VRAM) or enable the GFPGAN enhancer to upscale facial details. For best results, upload a high-quality, well-lit portrait photo.

Yes. Upload an MP4 or WebM as the face input and we will use the first frame as the driving identity. For full video re-dubbing (per-frame mouth replacement), see the upcoming Dubbing Studio video pipeline.

Yes. POST a multipart request to /api/v1/lipsync/ with face and audio fields, then poll /api/v1/lipsync/result/?uuid= until status is "completed". The response contains a URL to the rendered MP4. API access requires a paid plan.

SadTalker uses face-alignment to detect and crop the most prominent face. For best results, upload a portrait with one person centered, eyes visible, and minimal occlusion. Group photos may produce unpredictable results.

5.0/5 (1)

Wëllt Dir ufänken?

Eng Kreditkaart ass net néideg.

Gratis anmelden Präislëscht

AI Lip Sync Video Generator

Upload Face + Audio

Your Talking-Head Video

About SadTalker

Tips for Best Results

Lip Sync Video Plans

Häufig gestallte Froen

What does the AI lip sync tool do?

What input formats are supported?

How long can the audio be?

How much does it cost?

Can I use the videos commercially?

How long does generation take?

What is the difference between "full" and "still" preset?

What is the GFPGAN enhancer?

Why does my output look low-resolution?

Can I lip-sync a video to new audio?

Is there an API?

What if my face photo has multiple people in it?

Wëllt Dir ufänken?