I-AI Lip Sync Video Generator

Layisha phezulu isithombe sesithombe kanye nevidiyo yevidiyo - uthole ividiyo ekhulumayo nge-lip sync ecacile, i-head pose, kanye nokugcizelela. Isebenza nge-SadTalker (MIT). Ukusetshenziswa kwebhizinisi OK.

Asikho isikhulumi se-TTS ezweni lakho. Sicela usize ukungeza isandla sakho! Uhlu lwamagama

Layisha phezulu isithombe

1,000 amaphawu ngomzuzu

Thwebula bese ushiya ihele lakho lapha, noma bheka

JPG, PNG, or short MP4/WebM. Max 10MB. One clear, well-lit face works best.

ifayela.mp3

0 MB

Thwebula bese ushiya ihele lakho lapha, noma bheka

MP3, WAV, M4A, or FLAC. Max 10MB. Free: up to 30 sec. Pro: up to 5 min.

ifayela.mp3

0 MB

Kuqhubekeka...

Ibonisa ividiyo yakho. Le nqubo ithatha imizuzwana engu-30 kuya kumaminithi angu-2.

Ividiyo yakho ekhuluma-ikhanda

Malunga ne-SadTalker

SadTalker (CVPR 2023, Tencent ARC) yimodeli yomthombo ovulekile okhulumayo owenza isithombe sobuso sisebenze ukukhuluma noma yikuphi umsindo. Ngokuhlukile kuzinhlobo ezahlukahlukene ze-Wav2Lip, iSadTalker ikwenza futhi isebenze isithombe sobuso, izithonjana, nokuveza ukuze kube nemiphumela eminingi ejwayelekile.

Ikhowudi nesisindo siyi-MIT-licensed end to end — akukho Llama, Gemma, noma i-non-commercial backbone — ngakho amavidiyo owenza aphephile ukusetshenziswa komnotho.

Izincomo zemiphumela engcono kakhulu

  • Sebenzisa isithombe esisezingeni eliphakeme, esikhanya kahle — amazinyo abonakala, umlomo uvalelekile
  • Ubuso obuphakathi, isikwere noma i-4:5 aspect ratio isebenza kahle kakhulu
  • Uhlu oluhlanzekile lwezwi (akukho mculo) lunikeza ukuhambisana okuqinile kwemilenze
  • Vumela i-GFPGAN isebenziseke izikhali ezinamandla — iphinda isikhathi sokubonisa kodwa iqinisa imininingwane
  • Sebenzisa i-Still preset uma ufuna isithombe esiqinile se-avatar

Izinhlelo zevidiyo ze-Lip Sync

Qala ngokukhululekile, uthuthukise uma ufuna okuningi

Ikhululekile
  • Umkhawulo womsindo wemizuzu engu-30
  • 256 px kuphuma
  • "Still" preset only
  • Akukho buso obuthuthukisiwe
Okuthandwa kakhulu
I-akhawunti Ekhululekile
  • Umkhawulo womsindo wemizuzu engu-30
  • "full" kanye "still" izilungiselelo ezingaphambili
  • 256 / 512 px okuqukethwe
  • GFPGAN face enhancer
Bhala
I-Pro
  • Umkhawulo womsindo wemizuzu emihlanu
  • Iphutha le-GPU
  • Ukungena kwe-API (ukulayisha okuningi)
  • I-Webhook iqedela ukubiza emuva
  • Ukusetshenziswa kwebhizinisi (ilayisense le-MIT)
Ukulungiswa

Imibuzo ebuzwa kaningi

Layisha phezulu isithombe sesithombe kanye nevidiyo yevidiyo, futhi i-AI ikhiqiza ividiyo yesithombe esikhuluma ngevidiyo ngemijikelezo yemilenze ecacile, i-pose yekhanda, kanye nokushaya kwenhliziyo. Ifakwe ku-SadTalker (CVPR 2023), imodeli yekhanda elikhulumayo eligunyazwe yi-MIT elibonisa ukubonakala okuningi kunesimo somlomo.

Isingeniso sesithombe singaba yisithombe se-JPG noma se-PNG (sifinyelela ku-10 MB) noma ividiyo encane ye-MP4/WebM ehamba phambili (sisebenzisa ifreyimu yokuqala). Umsindo ohamba phambili ungaba yi-MP3, WAV, M4A, noma i-FLAC kuze kube yi-10 MB. Sibuyisela umsindo ku-16 kHz ngaphakathi.

Ama-akhawunti amahhala: kuze kube yimizuzu engu-30 nge-clip ngayinye. Abasebenzisi abakhokhelwayo: kuze kube yimizuzu engu-5 ngesicelo ngasinye. Umsindo omde kusho isikhathi sokubonisa eside kanye nezindleko eziphezulu zophawu.

Ividiyo ye-lip sync isebenzisa amaphawu angama-1,000 ngesekondi yevidiyo ekhiqizwe. I-clip yesekondi engu-30 = amaphawu angama-30,000. Izindleko zikhokhwa ngaphambili kusuka ku-character balance yakho futhi zibuyiselwa ngokuzenzakalela uma ukukhiqizwa kuphumelela.

Yebo — Ikhowudi nesisindo seSadTalker siyi-MIT licensed end to end (akukho Llama, Gemma, noma i-non-commercial backbone). Amavidiyo akhiqizwa yi-MIT asetshenziswa ngokuhweba. Ubhekene nelungelo lokufaka sithombe sobuso kanye ne-audio.

Imizuzu engu-30 yevidiyo yemizuzu engu-5 kwiseva yethu ye-A100, ilinganiselwa ngokuqondile ngesikhathi sokudlalwa kwesandi. Ukuqalisa i-GFPGAN face enhancer kwenza isikhathi sokudlalwa kube ngaphezu kwe-2x kodwa kwenza okuqukethwe okuqinile, okusezingeni eliphezulu.

Isilungiselelo esiphelele (isimiso) sibonisa isikhala sekhanda, izikhashana, kanye nesimo kanye nemilomo, sikhiqiza ividiyo ekhuluma kakhulu yekhanda. Isilungiselelo esiphelele sivala ikhanda endaweni futhi sibonisa isikhala kuphela somlomo — kubaluleke uma ufuna isithombe esiqinile se-avatar.

I-GFPGAN iyimodeli yokubuyisela isithombe esibonisa isithombe esicacile ngemuva kokuveza i-lip-sync. Ihlanza ama-artefacts futhi ikwenza i-256-pixel output ibonakale ifana ne-512. Iphinda iphinde isikhathi sokuveza kodwa ifanele ukuboshwa kwe-hero.

I-SadTalker ibonisa nge-256 px ngokuzenzakalela. Shicilela ku-512 px ubukhulu ukuze uphume ngokucacile (ukukhawulela, i-VRAM ephezulu) noma vumela i-GFPGAN ethuthukisayo ukuba ikhuphule iminingwane yesikhumba. Ukuthola imiphumela engcono, thumela isithombe esisezingeni eliphakeme, esikhanyayo.

Yebo. Layisha phezulu i-MP4 noma i-WebM njenge-input yesithombe futhi sizosebenzisa ifreyimu yokuqala njenge-identity yokushayela. Ukuthola i-video egcwele yokudubula kabusha (ukuguqulwa kwe-mouth-per-frame), bona i-Dubbing Studio video pipeline ezayo.

Yebo. POST isicelo esiningi sengxenye ku /api/v1/lipsync/ ngesithombe kanye nemikhakha yomsindo, bese ukhetha /api/v1/lipsync/result/?uuid= kuze kube yilapho isimo siyi "kuqediwe". Uphendulo luqukethe i-URL ye-MP4 eyenziwe. Ukufinyelela kwe-API kuthatha i-plan ekhokhelwayo.

I-SadTalker isebenzisa ukulinganisa kwesithombe ukuze ithole futhi ikhulise isithombe esiphawulekayo. Ukuthola imiphumela engcono, thumela isithombe esigcwele umuntu oyedwa obekwe phezulu, amehlo abonakala, nokuvalwa okuncane. Izithombe zeqembu zingadala imiphumela engalindelekile.
5.0/5 (1)

Yini esingayithuthukisa? Umbono wakho usiza ukuxazulula izinkinga.

Ukulungele ukuqala?

Bhala ngokumahhala futhi uthole amaphawu angama-15,000. Akukho khadi le-credit elidingekayo.