-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Duplicate-Image-Finder
difPy - Python package for finding duplicate or similar images within folders
All of my research shows that we are going to need to use some kind of table or something because we are dealing with tens of thousands of images if not more so it would not really be able to take an image and manually open every single one of the others each iteration. Research so far seems to show we are looking for equality and not so much direct comparison. I have been able to find what seems to be the closest solution which would be at https://stackoverflow.com/questions/71514124/find-near-duplicate-and-faked-images where two proposed solutions would either use the OpenAI Contrastive Language-Image Pre-Training (CLIP) Model in one solution or use the CV2 Gaussian Blur method listed in another solution. The problem we are having is that we are not sure how to convert this to a large scale as mentioned above where it would work with thousands of images. Speed is not an ultra huge factor, it could run for a while, but it just seems like opening tens thousands of images for comparison each iteration would not be the best approach, or am I incorrect in thinking this?
imagededup
difPy