Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Yes, this is definitely possible. You can maybe try computing some kind of image distance between frames or some keyframe extraction.
Once you compute the features, the search is very efficient! I tried it for searching in the 2M photos dataset from Unsplash and it takes like 2-3 seconds: https://github.com/haltakov/natural-language-image-search
I plan to run my personal photos through it :)
Yes, I know that this is a bit slow. The problem is you really need 1.7.1, because 1.7.0 leads to some strange issues and broken results:
https://github.com/openai/CLIP/issues/13#issuecomment-771143...
Yes, I know. :D Your previous project with Unsplash made me try a similar approach [1] for banners of video games on Steam.
[1] https://github.com/woctezuma/steam-image-search