Umenzi Wevidiyo We-AI Lip Sync

Layisha phezulu umfanekiso we-face kunye ne-audio clip - fumana ividiyo ye-voice-head ene-realistic lip sync, i-head pose, kunye ne-blinks. Isebenza nge-SadTalker (MIT). Ukusetyenziswa kwentengiso OK.

Asikho nasiphi na isandi se-TTS kwisiNgesi sakho. Nceda uncedo lwethu ukongeza isandi sakho! Intengiso yelizwi lakho

Layisha phezulu i Face + Audio

1, 000 iimpawu ngomzuzwana

Rhweba ngaphandle amanqaku encwadi ye Mozilla Khangela

JPG, PNG, or short MP4/WebM. Max 10MB. One clear, well-lit face works best.

ifayili.mp3

0 MB

Rhweba ngaphandle amanqaku encwadi ye Mozilla Khangela

MP3, WAV, M4A, or FLAC. Max 10MB. Free: up to 30 sec. Pro: up to 5 min.

ifayili.mp3

0 MB

Iqhubekeka...

Ibonisa ividiyo yakho. Oku kuthathelwa ingqalelo ukuba kuthatha imizuzwana engama-30 ukuya kwemizuzwana emibini.

Ividiyo yakho ethetha-i-ngqwalasela

Malunga ne SadTalker

I-SadTalker (CVPR 2023, Tencent ARC) yimodeli ye-open-source ye-talk-head eyenza umfanekiso we-face ofanayo usebenze ukuthetha nayiphi na i-audio. Ngokungafaniyo ne-Wav2Lip, i-SadTalker iyenza i-head pose, i-blinks, kunye ne-expression isebenze ukubonelela nge-outcome eninzi ebonakalayo.

Ikhowudi kunye nesisindo zisemthethweni kwi MIT ukusuka ekuqaleni ukuya kukuphela — akukho Llama, Gemma, okanye i-non-commercial backbone — ngoko iividiyo ozivelisayo zikhuselekile kwi-commercial use.

Iingcebiso zeziphumo ezilungileyo

  • Sebenzisa umfanekiso ophezulu womgangatho, okhanyayo — amehlo abonakala, umlomo uvale
  • Ubuso obuphakathi, isikwere okanye 4:5 uthelekiso lwe-aspect lusebenza kakuhle kakhulu
  • Ukuthetha okucocekileyo (akukho mculo) kunika ukulungelelaniswa kweliphu okuqinileyo
  • Yenza i-GFPGAN isebenze kwi-hero shots - iphindaphindwe ixesha lokuveza kodwa ikhawuleza inkcukacha
  • Sebenzisa i-Still preset xa ufuna umfanekiso okhawulezayo okhawulezayo

Iinkqubo zevidiyo ze-Lip Sync

Qala ngokukhululekileyo, uphucule xa ufuna okuninzi

Iinketho zelizwe
  • Umda wesandi wemizuzu engama-30
  • 256 px imveliso
  • "Isilele" kuphela
  • Akukho mfanekiso okhawulezayo
Ethandwa Kakhulu
I-akhawunti Ekhululekileyo
  • Umda wesandi wemizuzu engama-30
  • Zonke ii-"full" kunye ne-"still" ezimiselweyo
  • 256 / 512 px imveliso
  • GFPGAN ukuphucula i-face
Ubhaliso
I-Pro
  • Umda wesandi wemizuzu emi-5
  • Ufolo lwe-GPU oluphambili
  • Ufikelelo lwe-API (ukukhuphela iinxalenye ezininzi)
  • I-Webhook igqiba ukubiza kwakhona
  • Ukusetyenziswa kwentengiso (ilayisensi yeMIT)
Yenza phezulu

Imibuzo ebuzwa rhoqo

Layisha phezulu umfanekiso we-face kunye ne-audio clip, kwaye i-AI ivelisa ividiyo ye-face ethetha i-audio ngemithambo ye-lip ebonakalayo, i-pose ye-head, kunye nokutyhila. Ifakwe kwi-SadTalker (CVPR 2023), imodeli ye-MIT-licensed ethetha-i-head eyenza ukubonakala okubonakalayo ngaphandle kobuso.

Ungeniso lwesiphelo sendlela lunokuba umfanekiso we-JPG okanye we-PNG (ukufikelela kwi-10 MB) okanye ividiyo efutshane ye-MP4/WebM yokuqhuba (sisebenzisa isakhelo sokuqala). Isandi sokuqhuba singaba yi-MP3, WAV, M4A, okanye i-FLAC ukuya kwi-10 MB. Sibuyisela isandi kwi-16 kHz ngaphakathi.

Ii-akhawunti ezikhululekileyo: ukuya kuthi ga kwimizuzu engama-30 nganye. Abasebenzisi abahlawulayo: ukuya kuthi ga kwimizuzu emihlanu ngesicelo ngasinye. Isandi eside sithetha ixesha elide lokuvelisa kunye nexabiso eliphezulu lophawu.

Ividiyo yokusebenza ngokuhambelanayo kweliphu isebenzisa iimpawu ezili-1,000 ngesekondi yevidiyo eyenziweyo. Iclip yesekondi ezili-30 = iimpawu ezili-30,000. Ixabiso lihlawulwe ngaphambili kwi-akhawunti yakho yophawu kwaye libuyiselwe ngokuzenzekelayo ukuba ukwenziwa akuphumelelanga.

Ewe - Ikhowudi yeSadTalker kunye nesisindo zi-MIT ezilayisensiwe ukusuka ekuqaleni ukuya kukuphela (akukho Llama, Gemma, okanye i-backbone engarhwebiyo). Iividiyo ozivelisayo ziye kuwe ukuze uzisebenzise ngokurhwebayo. Ufanele ube nelungelo lomfanekiso wobuso kunye nesandi oza kuziphonononga.

About 30 seconds for a 5-second clip on our A100 server, scaling roughly linearly with audio length. Enabling the GFPGAN face enhancer roughly doubles render time but produces sharper, higher-quality output.

Inkqubo emiselweyo epheleleyo (emiselweyo) ibonisa ukubonakala kwengcambu, imilenze, kunye nokubonakala kunye nemilebe, ivelisa ividiyo ethetha-thethana ebonakalayo. Inkqubo emiselweyo igcina ingcambu kwindawo yayo kwaye ibonisa ukubonakala komlomo kuphela - isebenza xa ufuna umfanekiso obonisa ibali oqinileyo.

I-GFPGAN yimodeli yokubuyisa i-face ekhawulezisa iinkcukacha ze-face emva kokubonisa i-lip-sync. Icoca ii-artefacts kwaye isenza i-256-pixel output ibonakale ifutshane ne-512. Iphindaphindwe kabini ixesha lokubonisa kodwa ifanelekile kwimifanekiso ye-hero.

I-SadTalker ibonisa nge-256 px ngokumiselweyo. Tshintsha ubungakanani be-512 px ukuze uzuze imveliso ecacileyo (ecothayo, i-VRAM ephezulu) okanye yenza i-GFPGAN ekhuthazayo ukuba ikhuphe inkcukacha zesiphelo sendlela. Ukufumana iziphumo ezilungileyo, ulayishe ifoto ephakamileyo, ekhanyayo.

Yes. Upload an MP4 or WebM as the face input and we will use the first frame as the driving identity. For full video re-dubbing (per-frame mouth replacement), see the upcoming Dubbing Studio video pipeline.

Ewe. thumela isicelo senxalenye ezininzi ku /api/v1/lipsync/ nge-face kunye neendawo zesandi, emva koko ujonge /api/v1/lipsync/result/?uuid= de i-status ibe "igqityiwe". Impendulo iqulethe i-URL ye-MP4 eyenziweyo. Ukufikelela kwi-API kudinga inkqubo ehlawulweyo.

I-SadTalker isebenzisa ulungelelaniso lwesiphelo sobuso ukuchonga nokusika isiphelo esiphambili sobuso. Ukufumana iziphumo ezilungileyo, ulayishe umfanekiso obonisa umntu omnye obekwe phezulu, amehlo abonakalayo, kunye nokuqhekeka okuncinci. Iifoto zeqela zingavelisa iziphumo ezingalindelekanga.
5.0/5 (1)

Yintoni esinokuyilungisa? Ulwazi lwakho olufunyenweyo lunceda silungise iingxaki.

Ilungile ukuqalisa?

Ubhaliso simahla kwaye ufumane 15,000 iimpawu. Akukho khadi letyala lifunekayo.