Wav2Lip
GFPGAN
Wav2Lip | GFPGAN | |
---|---|---|
34 | 93 | |
9,257 | 34,637 | |
- | 0.8% | |
4.8 | 2.7 | |
11 days ago | about 1 month ago | |
Python | Python | |
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Wav2Lip
-
Show HN: Sync (YC W22) – an API for fast and affordable lip-sync at scale
Hey HN, we’re sync. (https://synclabs.so/). We’re building fast + lightweight audio-visual models to create, modify, and understand humans in video.
You can check our more about us and our company in this video here: https://bit.ly/3TV27rd
Our first api lets you lip-sync a person in a video to an audio in any language in zero-shot. You can check out some examples here (https://bit.ly/3IT3UXk)
Here’s a demo showing how it works and how to sync your first video / audio: https://bit.ly/4ablRwo
Our playground + api is live, you can play with our models here: https://app.synclabs.so/
Four years ago we open-sourced Wav2lip (https://github.com/Rudrabha/Wav2Lip), the first model to lipsync anyone to any audio w/o having to train for each speaker. Even now, it’s the most prolific lipsyncing model to date (almost 9k GitHub stars).
Human lip-sync enables interesting features for many products – you can use it to seamlessly translate videos from one language to another, create personalized ads / video messages to send to your customers, or clone yourself so you never have to record a piece of content again.
We’re excited about this area of research / the models we’re building because they can be impactful in many ways:
[1] we can dissolve language as a barrier
check out how we used it to dub the entire 2-hour Tucker Carlson interview with Putin speaking fluent English: https://vimeo.com/914605299
imagine millions gaining access to knowledge, entertainment, and connection — regardless of their native tongue.
realtime at the edge takes us further — live multilingual broadcasts + video calls, even walking around Tokyo w/ a Vision Pro 2 speaking English while everyone else Japanese.
[2] we can move the human-computer interface beyond text-based-chat
keyboard / mice are lossy + low bandwidth. human communication is rich and goes beyond just the words we say. what if we could compute w/ a face-to-face interaction?
Many people get carried away w/ the fact LLMs can generate, but forget they can also read. The same is true for these audio/visual models — generation unlocks a portion of the use-cases, but understanding humans from video unlocks huge potential.
Embedding context around expressions + body language in inputs / outputs would help us interact w/ computers in a more human way.
[3] and more
powerful models small enough to run at the edge could unlock a lot:
eg.
-
Ideas to recreate audio
If your technically inclined you can use https://github.com/Rudrabha/Wav2Lip to sync the lip movements to the new audio.
-
How to make deep fake lip sync using Wav2Lip
This is the Github link : https://github.com/Rudrabha/Wav2Lip
-
Dark Brandon going hard
Video mapping onto Audio: Now you have Audio with coherent back and forth dialogue. To get the looped video puppets, you find a relatively stable interview clip (in this channel and many of Athenes other ones, the clips of the people just stay in one place). Then feed the audio + video clip into a lipsync algorithm like this https://bhaasha.iiit.ac.in/lipsync/
- Is it possible to sync a lip and facial expression animation with audio in real time?
-
A little bedtime story by the AI nanny | Stable Diffusion + GPT = a match made in latent space
It's not animating really, just lip sync and face restoration, here I used: https://github.com/Rudrabha/Wav2Lip and https://github.com/TencentARC/GFPGAN respectively.
-
Elevenlabs voice clone and janky avatarify with wav2lip added.
I just used the web based wav2lip demo. https://bhaasha.iiit.ac.in/lipsync/ Haven’t used the plan in a while, however the colab gives much better results. This was just a quick dusty example done all in the phone.
- retromash - The Tide is High / Thinking Out Loud (Blondie, Ed Sheeran)
-
Who knows how to create long-form & cheap AI avatar content? The three main platforms (Synthesia, Movio, & D-ID) all charge over $20 a month for ~ 15 minutes of content, but this TikTok user streamed for 90 hours… how did he pull that off?
https://github.com/Rudrabha/Wav2Lip Demo: https://youtu.be/0fXaDCZNOJc
- Video editing with AI
GFPGAN
- Ask HN: What is the state of the art in AI photo enhancement?
-
Open source software has gotten a lot better at having smooth swaps. Below is what i got.
Mainly as the base model. https://insightface.ai/ There was some post processing done to further improve quality. https://github.com/TencentARC/GFPGAN
-
AI generator for photos of past people?
Check out GFOGAN https://github.com/TencentARC/GFPGAN
-
A little bedtime story by the AI nanny | Stable Diffusion + GPT = a match made in latent space
It's not animating really, just lip sync and face restoration, here I used: https://github.com/Rudrabha/Wav2Lip and https://github.com/TencentARC/GFPGAN respectively.
-
Tools For AI Animation and Filmmaking , Community Rules, ect. (**FAQ**)
Real-ESRGAN/GFPGAN https://github.com/xinntao/Real-ESRGAN (Real-ESRAN - Upscale images, facial restoration with GFPGAN setting) https://github.com/TencentARC/GFPGAN (GFPGAN - Facial restoration and Upscale) -------MATTE AND COMPOSITE--------
-
what should i do
2.install it using pip install git+https://github.com/TencentARC/GFPGAN.git@8d2447a2d918f8eba5a4a01463fd48e45126a379
-
hey guys, need hsome help
Cloning https://github.com/TencentARC/GFPGAN.git (to revision 8d2447a2d918f8eba5a4a01463fd48e45126a379) to /private/var/folders/qw/_kkcnr2s59n3kplbwxsk_k2c0000gn/T/pip-req-build-bgantjtl
-
Need help, keep getting [Errno 2] No such file or directory when running batch file.
All I have done is add Stable Diffusion Checkpoints and the GFPGAN checkpoints to C:\Program Files (x86)\stable-diffusion-webui-master\stable-diffusion-webui-master\models\Stable-diffusion.
- Free AI-Powered Photo Restoration Service to Restore Photos
-
Sorry I'm not a Coder. How do you use GPFGAN with Anaconda Stable Diffusion?
Where you would find its commands here.
What are some alternatives?
stylegan2 - StyleGAN2 - Official TensorFlow Implementation
CodeFormer - [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Thin-Plate-Spline-Motion-Model - [CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.
Real-ESRGAN - Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
first-order-model - This repository contains the source code for the paper First Order Motion Model for Image Animation
GPEN
chatgpt-raycast - ChatGPT raycast extension
DFDNet - Blind Face Restoration via Deep Multi-scale Component Dictionaries (ECCV 2020)
DeepFaceLive - Real-time face swap for PC streaming or video calls
stable-diffusion-webui - Stable Diffusion web UI [Moved to: https://github.com/sd-webui/stable-diffusion-webui]
Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
stable-diffusion-webui - Stable Diffusion web UI