Jupyter Notebook multi-modal-learning Projects

open_clip

30 8,705 8.2 Jupyter Notebook

An open source implementation of CLIP.

Project mention: Binarize Clip for Multimodal Applications | news.ycombinator.com | 2024-05-23

The part of CLIP[1] that you need to know to understand this is that it embeds text and images into the same space. ie: the word "dog" is close to images of dogs. Normally this space is a high dimensional real space. Think 512-dimensional or 512 floating point numbers. When you want to measure "closeness" between vectors in this space cosine similarity[2] is a natural choice.
Why would you want to quantize values? Well, instead of using a 32-bit float for each dimension, what if you could get away with 1-bit? You would save you 31x the space. Often you'll want to embed millions or billions of pieces of text or images, so the savings represent a huge speed & cost savings and if accuracy isn't impacted too much then it could be worth it.
If you naively clip the floats of an existing model, it severely impacts accuracy. However, if you train a model from scratch that produces binary outputs, then it appears to perform better.
There is one twist. Deep learning models rely on gradient descent to train and binary output doesn't produce useful gradients. We use cosine similarity on floating point vectors and hamming distance on bit vectors. Is there a function that behaves like hamming distance but is nicely differentiable? We can then use this function during training and then vanilla hamming distance during inference. It seems like they've done that.
I'd suggest playing around with OpenCLIP[3]. My background is in data science but all my CLIP knowledge comes from doing a side project over the course of a couple weekends.
1. https://huggingface.co/docs/transformers/model_doc/clip
2. https://en.wikipedia.org/wiki/Cosine_similarity
3. https://github.com/mlfoundations/open_clip

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook multi-modal-learning related posts

Binarize Clip for Multimodal Applications

1 project | news.ycombinator.com | 23 May 2024
Database of 16,000 Artists Used to Train Midjourney AI Goes Viral

1 project | news.ycombinator.com | 7 Jan 2024
Is Nicholas Renotte a good guide for a person who knows nothing about ML?

1 project | /r/learnmachinelearning | 27 Jun 2023
Generate Image from Vector Embedding

1 project | /r/StableDiffusion | 6 Jun 2023

Index

	Project	Stars
1	open_clip	8,705

Jupyter Notebook multi-modal-learning

Jupyter Notebook multi-modal-learning Projects

open_clip

InfluxDB

Jupyter Notebook multi-modal-learning related posts

Binarize Clip for Multimodal Applications

Database of 16,000 Artists Used to Train Midjourney AI Goes Viral

Is Nicholas Renotte a good guide for a person who knows nothing about ML?

Generate Image from Vector Embedding

Index