Ask HN: Any good self-hosted image recognition software?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • awesome-teachable-machine

    Useful resources for creating projects with Teachable Machine models + curated list of already built Awesome Apps!

    https://teachablemachine.withgoogle.com/

    Use that site to capture images from your web camera to find examples of each class of object and see if this tool can work for you.

  • roboflow-python

    The official Roboflow Python package. Manage your datasets, models, and deployments. Roboflow has everything you need to build a computer vision application.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • roboflow-api-snippets

    repo for versioning snippets that show how to use Roboflow APIs

  • blackjack-basic-strategy

    A computer vision powered Blackjack basic strategy app powered by Roboflow.

  • Milvus

    A cloud-native vector database, storage for next generation AI applications

    Usually this is done in three steps. The first step is using a neural network to create a bounding box around the object, then generating vector embeddings of the object, and then using similarity search on vector embeddings.

    The first step is accomplished by training a detection model to generate the bounding box around your object, this can usually be done by finetuning an already trained detection model. For this step the data you would need is all the images of the object you have with a bounding box created around it, the version of the object doesnt matter here.

    The second step involves using a generalized image classification model thats been pretrained on generalized data (VGG, etc.) and a vector search engine/vector database. You would start by using the image classification model to generate vector embeddings (https://frankzliu.com/blog/understanding-neural-network-embe...) of all the different versions of the object. The more ground truth images you have, the better, but it doesn't require the same amount as training a classifier model. Once you have your versions of the object as embeddings, you would store them in a vector database (for example Milvus: https://github.com/milvus-io/milvus).

    Now whenever you want to detect the object in an image you can run the image through the detection model to find the object in the image, then run the sliced out image of the object through the vector embedding model. With this vector embedding you can then perform a search in the vector database, and the closest results will most likely be the version of the object.

    Hopefully this helps with the general rundown of how it would look like. Here is an example using Milvus and Towhee https://github.com/towhee-io/examples/tree/3a2207d67b10a246f....

    Disclaimer: I am a part of those two open source projects.

  • examples

    Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc. (by towhee-io)

    Usually this is done in three steps. The first step is using a neural network to create a bounding box around the object, then generating vector embeddings of the object, and then using similarity search on vector embeddings.

    The first step is accomplished by training a detection model to generate the bounding box around your object, this can usually be done by finetuning an already trained detection model. For this step the data you would need is all the images of the object you have with a bounding box created around it, the version of the object doesnt matter here.

    The second step involves using a generalized image classification model thats been pretrained on generalized data (VGG, etc.) and a vector search engine/vector database. You would start by using the image classification model to generate vector embeddings (https://frankzliu.com/blog/understanding-neural-network-embe...) of all the different versions of the object. The more ground truth images you have, the better, but it doesn't require the same amount as training a classifier model. Once you have your versions of the object as embeddings, you would store them in a vector database (for example Milvus: https://github.com/milvus-io/milvus).

    Now whenever you want to detect the object in an image you can run the image through the detection model to find the object in the image, then run the sliced out image of the object through the vector embedding model. With this vector embedding you can then perform a search in the vector database, and the closest results will most likely be the version of the object.

    Hopefully this helps with the general rundown of how it would look like. Here is an example using Milvus and Towhee https://github.com/towhee-io/examples/tree/3a2207d67b10a246f....

    Disclaimer: I am a part of those two open source projects.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts