Python vector-search

Open-source Python projects categorized as vector-search

Top 21 Python vector-search Projects

  • deeplake

    Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

  • Project mention: FLaNK AI Weekly 25 March 2025 | dev.to | 2024-03-25
  • txtai

    ๐Ÿ’ก All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • Project mention: Build knowledge graphs with LLM-driven entity extraction | dev.to | 2024-02-21

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • GPTCache

    Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

  • Project mention: Ask HN: What are the drawbacks of caching LLM responses? | news.ycombinator.com | 2024-03-15

    Just found this: https://github.com/zilliztech/GPTCache which seems to address this idea/issue.

  • Resume-Matcher

    Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.

  • Project mention: Hacktoberfest 2023: The Complete Guide | dev.to | 2023-09-22

    GitHub: https://github.com/srbhr/Resume-Matcher Website: https://www.resumematcher.fyi/ Discord: Resume Matcher's Discord Tech Stack: Python, NextJS, FastAPI, TypeScript

  • superduperdb

    ๐Ÿ”ฎ SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

  • Project mention: FLaNK Stack Weekly 12 February 2024 | dev.to | 2024-02-12
  • marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

  • Project mention: Are we at peak vector database? | news.ycombinator.com | 2024-01-25

    We (Marqo) are doing a lot on 1 and 2. There is a huge amount to be done on the ML side of vector search and we are investing heavily in it. I think it has not quite sunk in that vector search systems are ML systems and everything that comes with that. I would love to chat about 1 and 2 so feel free to email me (email is in my profile). What we have done so far is here -> https://github.com/marqo-ai/marqo

  • gerev

    ๐Ÿง  AI-powered enterprise search engine ๐Ÿ”Ž

  • Project mention: A FOSS chat bot trained on docs/ansible? | /r/selfhosted | 2023-06-05
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • core

    Production ready AI assistant framework (by cheshire-cat-ai)

  • Project mention: Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023? | news.ycombinator.com | 2023-12-24

    I haven't personally tried this for anything serious yet, but to get the thread started:

    Cheshire Cat [0] looks promising. It's a framework for building AI assistants by providing it with documents that it stores as "memories" that can be retrieved later. I'm not sure how well it works yet, but it has an active community on Discord and seems to be developing rapidly.

    [0] https://github.com/cheshire-cat-ai/core

  • uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and ๐Ÿ”œ video, up to 5x faster than OpenAI CLIP and LLaVA ๐Ÿ–ผ๏ธ & ๐Ÿ–‹๏ธ

  • Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25

    question: any good on-device size image embedding models?

    tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?

  • fastembed

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

  • Project mention: FastLLM by Qdrant โ€“ lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
  • qdrant-client

    Python client for Qdrant vector search engine

  • Project mention: Show HN: Chromem-go โ€“ Embeddable vector database for Go | news.ycombinator.com | 2024-04-05

    Qdrant lib project https://github.com/tyrchen/qdrant-lib, Qdrant SDK has also support for local mode, which means embeddable https://github.com/qdrant/qdrant-client

  • vectordb

    A Python vector database you just need - no more, no less. (by jina-ai)

  • Project mention: A Python Vector Database | news.ycombinator.com | 2023-08-13
  • cherche

    Neural Search

  • Project mention: [P] Semantic search | /r/MachineLearning | 2023-05-08

    If you are interested, you can check out the documentation here: https://github.com/raphaelsty/cherche

  • gpl

    Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577 (by UKPLab)

  • Project mention: Best pathway for Domain Adaptation with Sentence Transformers? | /r/LanguageTechnology | 2023-04-26

    3) Domain-adapted my bi-encoder using GPL (https://github.com/UKPLab/gpl) and my original corpus from step 1.

  • vector-db-benchmark

    Framework for benchmarking vector search engines

  • Project mention: RAG is Dead. Long Live RAG! | dev.to | 2024-02-28

    Qdrantโ€™s benchmark results are strongly in favor of accuracy and efficiency. We recommend that you consider them before deciding that an LLM is enough. Take a look at our open-source benchmark reports and try out the tests yourself.

  • code-indexer-loop

    Code Indexer Loop is a Python library for indexing and retrieving source code files through an integrated vector database that's continuously and efficiently updated.

  • Project mention: Python library for indexing and retrieving source code files through an integrated vector database (not mine) | /r/LocalLLaMA | 2023-09-13
  • relevanceai

    Home of the AI workforce - Multi-agent system, AI agents & tools

  • unisim

    UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.

  • Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30
  • weaviate-txtai

    An integration of the weaviate vector search engine with txtai

  • Project mention: External database integration | dev.to | 2023-09-07

    As mentioned previously, all of the main components of txtai can be replaced with custom components. For example, there are external integrations for storing dense vectors in Weaviate and Qdrant to name a few.

  • MedSearch

    Vector Search Application for Image Similarity Search, specifically designed for medical X-rays, leveraging ResNet50, Chest-XRay dataset and Milvus vector database

  • Project mention: Show HN: MedSearch: vector similarity search app for medical image retrieval | news.ycombinator.com | 2024-03-24
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python vector-search related posts

Index

What are some of the best open-source vector-search projects in Python? This list will help you:

Project Stars
1 deeplake 7,690
2 txtai 6,953
3 GPTCache 6,406
4 Resume-Matcher 4,503
5 superduperdb 4,327
6 marqo 4,111
7 gerev 2,601
8 core 1,927
9 uform 865
10 fastembed 759
11 qdrant-client 608
12 vectordb 462
13 cherche 311
14 gpl 308
15 vector-db-benchmark 224
16 bert-solr-search 160
17 code-indexer-loop 159
18 relevanceai 97
19 unisim 63
20 weaviate-txtai 7
21 MedSearch 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com