-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Interesting read. Especially the lookup method based on partitioning.
I tried to implement a similar reverse image search based on dHash as explained here https://github.com/Rayraegah/dhash . However, I also had lookup performance problems. Exact matches are not a problem but the Hamming distance threshold matching is. Because my project was in Python, I tried to eke out more performance by writing a BK-tree backend module in C++ https://github.com/mxmlnkn/cppbktree It was 2 to 10x faster than an existing similar module but still was too slow when trying to look up something in a database of millions of images. However, as lookup tended to depend on the exact Hamming-distance threshold value, my next step would have been to try and optimize the hash. E.g, make it shorter so that only a short Hamming distance is necessary to be looked up but the mentioned multi-indexing method looks much more promising and tested.
Interesting read. Especially the lookup method based on partitioning.
I tried to implement a similar reverse image search based on dHash as explained here https://github.com/Rayraegah/dhash . However, I also had lookup performance problems. Exact matches are not a problem but the Hamming distance threshold matching is. Because my project was in Python, I tried to eke out more performance by writing a BK-tree backend module in C++ https://github.com/mxmlnkn/cppbktree It was 2 to 10x faster than an existing similar module but still was too slow when trying to look up something in a database of millions of images. However, as lookup tended to depend on the exact Hamming-distance threshold value, my next step would have been to try and optimize the hash. E.g, make it shorter so that only a short Hamming distance is necessary to be looked up but the mentioned multi-indexing method looks much more promising and tested.
I have found the ML image categorization models an excellent method of extracting a unique descriptor. It is possible to compress the image for matching and storage into a compact signature.
I did it here: https://github.com/starkdg/phashml
It is available in a python module that uses tensorflow model.
There's limits to how short you can make the perceptual hash. The more you compress it, the more information you lose.
The ML image classification models can be used to extract a good descriptor that can be further reduced into a compact signature.
https://github.com/starkdg/pyphashml
For indexing, I've had some success with distance-based indexing. Here's a comparison of some structures I used:
https://github.com/starkdg/pyphashml
Feel free to contact me, if you want to discuss this further.
Related posts
-
GPUsGoBurr: Get up to 2x higher performance by Tuning LLM Inference Deployment
-
Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o
-
PaliGemma: Open-Source Multimodal Model by Google
-
Project Gameface Launches on Android
-
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation