The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Wav2Lip Alternatives
Similar projects and alternatives to Wav2Lip
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
first-order-model
This repository contains the source code for the paper First Order Motion Model for Image Animation
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Wav2Lip reviews and mentions
-
Show HN: Sync (YC W22) – an API for fast and affordable lip-sync at scale
Hey HN, we’re sync. (https://synclabs.so/). We’re building fast + lightweight audio-visual models to create, modify, and understand humans in video.
You can check our more about us and our company in this video here: https://bit.ly/3TV27rd
Our first api lets you lip-sync a person in a video to an audio in any language in zero-shot. You can check out some examples here (https://bit.ly/3IT3UXk)
Here’s a demo showing how it works and how to sync your first video / audio: https://bit.ly/4ablRwo
Our playground + api is live, you can play with our models here: https://app.synclabs.so/
Four years ago we open-sourced Wav2lip (https://github.com/Rudrabha/Wav2Lip), the first model to lipsync anyone to any audio w/o having to train for each speaker. Even now, it’s the most prolific lipsyncing model to date (almost 9k GitHub stars).
Human lip-sync enables interesting features for many products – you can use it to seamlessly translate videos from one language to another, create personalized ads / video messages to send to your customers, or clone yourself so you never have to record a piece of content again.
We’re excited about this area of research / the models we’re building because they can be impactful in many ways:
[1] we can dissolve language as a barrier
check out how we used it to dub the entire 2-hour Tucker Carlson interview with Putin speaking fluent English: https://vimeo.com/914605299
imagine millions gaining access to knowledge, entertainment, and connection — regardless of their native tongue.
realtime at the edge takes us further — live multilingual broadcasts + video calls, even walking around Tokyo w/ a Vision Pro 2 speaking English while everyone else Japanese.
[2] we can move the human-computer interface beyond text-based-chat
keyboard / mice are lossy + low bandwidth. human communication is rich and goes beyond just the words we say. what if we could compute w/ a face-to-face interaction?
Many people get carried away w/ the fact LLMs can generate, but forget they can also read. The same is true for these audio/visual models — generation unlocks a portion of the use-cases, but understanding humans from video unlocks huge potential.
Embedding context around expressions + body language in inputs / outputs would help us interact w/ computers in a more human way.
[3] and more
powerful models small enough to run at the edge could unlock a lot:
eg.
-
Ideas to recreate audio
If your technically inclined you can use https://github.com/Rudrabha/Wav2Lip to sync the lip movements to the new audio.
-
How to make deep fake lip sync using Wav2Lip
This is the Github link : https://github.com/Rudrabha/Wav2Lip
-
Dark Brandon going hard
Video mapping onto Audio: Now you have Audio with coherent back and forth dialogue. To get the looped video puppets, you find a relatively stable interview clip (in this channel and many of Athenes other ones, the clips of the people just stay in one place). Then feed the audio + video clip into a lipsync algorithm like this https://bhaasha.iiit.ac.in/lipsync/
- Is it possible to sync a lip and facial expression animation with audio in real time?
-
A little bedtime story by the AI nanny | Stable Diffusion + GPT = a match made in latent space
It's not animating really, just lip sync and face restoration, here I used: https://github.com/Rudrabha/Wav2Lip and https://github.com/TencentARC/GFPGAN respectively.
-
Elevenlabs voice clone and janky avatarify with wav2lip added.
I just used the web based wav2lip demo. https://bhaasha.iiit.ac.in/lipsync/ Haven’t used the plan in a while, however the colab gives much better results. This was just a quick dusty example done all in the phone.
- retromash - The Tide is High / Thinking Out Loud (Blondie, Ed Sheeran)
-
Who knows how to create long-form & cheap AI avatar content? The three main platforms (Synthesia, Movio, & D-ID) all charge over $20 a month for ~ 15 minutes of content, but this TikTok user streamed for 90 hours… how did he pull that off?
https://github.com/Rudrabha/Wav2Lip Demo: https://youtu.be/0fXaDCZNOJc
- Video editing with AI
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Stats
The primary programming language of Wav2Lip is Python.
Sponsored