[P] Image search with localization and open-vocabulary reranking.

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

marqo

114 4,111 9.3 Python

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Markdown: https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchLocalization/article.md

detectron2

49 28,671 7.5 Python

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
mmdetection

23 27,742 8.7 Python

OpenMMLab Detection Toolbox and Benchmark

I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

YOLOX

12 9,012 1.5 Python

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

dino

7 5,836 1.0 Python

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

I also implemented one based on the self attention maps from the DINO trained ViT’s. This worked pretty well when the attention maps were combined with some traditional computer vision to get bounding boxes. It seemed an ok compromise between domain specialization and location specificity. I did not try any saliency or gradient based methods as i was not sure on generalization and speed respectively. I know LAVIS has an implementation of grad cam and it seems to work well in the plug'n'play vqa.

LAVIS

18 8,696 7.0 Jupyter Notebook

LAVIS - A One-stop Library for Language-Vision Intelligence

I also implemented one based on the self attention maps from the DINO trained ViT’s. This worked pretty well when the attention maps were combined with some traditional computer vision to get bounding boxes. It seemed an ok compromise between domain specialization and location specificity. I did not try any saliency or gradient based methods as i was not sure on generalization and speed respectively. I know LAVIS has an implementation of grad cam and it seems to work well in the plug'n'play vqa.

Detic

11 1,763 1.9 Python

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

For localisation at search time I ended up using OWL-ViT. This worked really well. I did not try Detic or CLIPseg but would be interested to hear if anyone else has tried these?

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
clipseg

7 1,008 3.8 Python

This repository contains the code of the CVPR 2022 paper "Image Segmentation Using Text and Image Prompts".

For localisation at search time I ended up using OWL-ViT. This worked really well. I did not try Detic or CLIPseg but would be interested to hear if anyone else has tried these?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project