[P] Image search with localization and open-vocabulary reranking.

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

    Markdown: https://github.com/marqo-ai/marqo/blob/mainline/examples/ImageSearchLocalization/article.md

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. detectron2

    Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

    I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

  4. mmdetection

    OpenMMLab Detection Toolbox and Benchmark

    I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

  5. YOLOX

    YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

    I wanted to have a few choices getting localization into image search (index and search time). I immediately thought of using a region proposal network (rpn) from mask-rcnn to create patches that can also be indexed and searched (and add the localisation). I figured it might be somewhat agnostic to classes. I did not want to use mmdetection or detectron2 due to their dependencies and just getting the rpn was not worth it. I was encouraged by the PyTorch native implementations of detection/segmentation models but ended up finding yolox the best.

  6. dino

    PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

    I also implemented one based on the self attention maps from the DINO trained ViT’s. This worked pretty well when the attention maps were combined with some traditional computer vision to get bounding boxes. It seemed an ok compromise between domain specialization and location specificity. I did not try any saliency or gradient based methods as i was not sure on generalization and speed respectively. I know LAVIS has an implementation of grad cam and it seems to work well in the plug'n'play vqa.

  7. LAVIS

    LAVIS - A One-stop Library for Language-Vision Intelligence

    I also implemented one based on the self attention maps from the DINO trained ViT’s. This worked pretty well when the attention maps were combined with some traditional computer vision to get bounding boxes. It seemed an ok compromise between domain specialization and location specificity. I did not try any saliency or gradient based methods as i was not sure on generalization and speed respectively. I know LAVIS has an implementation of grad cam and it seems to work well in the plug'n'play vqa.

  8. Detic

    Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

    For localisation at search time I ended up using OWL-ViT. This worked really well. I did not try Detic or CLIPseg but would be interested to hear if anyone else has tried these?

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. clipseg

    This repository contains the code of the CVPR 2022 paper "Image Segmentation Using Text and Image Prompts".

    For localisation at search time I ended up using OWL-ViT. This worked really well. I did not try Detic or CLIPseg but would be interested to hear if anyone else has tried these?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

Did you know that Python is
the 2nd most popular programming language
based on number of references?