The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more โ
Top 21 Python vector-search Projects
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
txtai
๐ก All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Resume-Matcher
Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.
-
superduperdb
๐ฎ SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and ๐ video, up to 5x faster than OpenAI CLIP and LLaVA ๐ผ๏ธ & ๐๏ธ
-
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577 (by UKPLab)
-
code-indexer-loop
Code Indexer Loop is a Python library for indexing and retrieving source code files through an integrated vector database that's continuously and efficiently updated.
-
unisim
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
-
MedSearch
Vector Search Application for Image Similarity Search, specifically designed for medical X-rays, leveraging ResNet50, Chest-XRay dataset and Milvus vector database
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Project mention: Ask HN: What are the drawbacks of caching LLM responses? | news.ycombinator.com | 2024-03-15Just found this: https://github.com/zilliztech/GPTCache which seems to address this idea/issue.
GitHub: https://github.com/srbhr/Resume-Matcher Website: https://www.resumematcher.fyi/ Discord: Resume Matcher's Discord Tech Stack: Python, NextJS, FastAPI, TypeScript
We (Marqo) are doing a lot on 1 and 2. There is a huge amount to be done on the ML side of vector search and we are investing heavily in it. I think it has not quite sunk in that vector search systems are ML systems and everything that comes with that. I would love to chat about 1 and 2 so feel free to email me (email is in my profile). What we have done so far is here -> https://github.com/marqo-ai/marqo
Project mention: Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023? | news.ycombinator.com | 2023-12-24I haven't personally tried this for anything serious yet, but to get the thread started:
Cheshire Cat [0] looks promising. It's a framework for building AI assistants by providing it with documents that it stores as "memories" that can be retrieved later. I'm not sure how well it works yet, but it has an active community on Discord and seems to be developing rapidly.
[0] https://github.com/cheshire-cat-ai/core
Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25question: any good on-device size image embedding models?
tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?
Project mention: FastLLM by Qdrant โ lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
Project mention: Show HN: Chromem-go โ Embeddable vector database for Go | news.ycombinator.com | 2024-04-05Qdrant lib project https://github.com/tyrchen/qdrant-lib, Qdrant SDK has also support for local mode, which means embeddable https://github.com/qdrant/qdrant-client
If you are interested, you can check out the documentation here: https://github.com/raphaelsty/cherche
Project mention: Best pathway for Domain Adaptation with Sentence Transformers? | /r/LanguageTechnology | 2023-04-263) Domain-adapted my bi-encoder using GPL (https://github.com/UKPLab/gpl) and my original corpus from step 1.
Qdrantโs benchmark results are strongly in favor of accuracy and efficiency. We recommend that you consider them before deciding that an LLM is enough. Take a look at our open-source benchmark reports and try out the tests yourself.
Project mention: Python library for indexing and retrieving source code files through an integrated vector database (not mine) | /r/LocalLLaMA | 2023-09-13
Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30
As mentioned previously, all of the main components of txtai can be replaced with custom components. For example, there are external integrations for storing dense vectors in Weaviate and Qdrant to name a few.
Project mention: Show HN: MedSearch: vector similarity search app for medical image retrieval | news.ycombinator.com | 2024-03-24
Python vector-search related posts
- RAG is Dead. Long Live RAG!
- Are we at peak vector database?
- Qdrant, the Vector Search Database, raised $28M in a Series A round
- Ask HN: Is there any good semantic search GUI for images or documents?
- Vector Databases: A Technical Primer [pdf]
- 90x Faster Than Pgvector โ Lantern's HNSW Index Creation Time
- Python library for indexing and retrieving source code files through an integrated vector database (not mine)
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Index
What are some of the best open-source vector-search projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | deeplake | 7,690 |
2 | txtai | 6,953 |
3 | GPTCache | 6,406 |
4 | Resume-Matcher | 4,503 |
5 | superduperdb | 4,327 |
6 | marqo | 4,111 |
7 | gerev | 2,601 |
8 | core | 1,927 |
9 | uform | 865 |
10 | fastembed | 759 |
11 | qdrant-client | 608 |
12 | vectordb | 462 |
13 | cherche | 311 |
14 | gpl | 308 |
15 | vector-db-benchmark | 224 |
16 | bert-solr-search | 160 |
17 | code-indexer-loop | 159 |
18 | relevanceai | 97 |
19 | unisim | 63 |
20 | weaviate-txtai | 7 |
21 | MedSearch | 3 |
Sponsored