Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python vector-database Projects
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
txtai
đź’ˇ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
superduperdb
đź”® SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
langchain-chatbot
AI Chatbot for analyzing/extracting information from data in conversational format.
-
ChatData
ChatData 🔍 📖 brings RAG to real applications with FREE✨ knowledge bases. Now enjoy your chat with 6 million wikipedia pages and 2 million arxiv papers.
-
DocumentGPT
DocumentGPT is a web application that allows you to chat over your research document using OpenAI's chat API and perform semantic search using vector databases. This tool provides a seamless interface for interacting with your research document, exploring search results, and engaging in a conversation with an AI chatbot.
-
NeoGPT
Your Local AI Assistant: Seamlessly Chat, Execute Commands, and Interpret Code with Local Models for Ultimate Privacy.
-
markdown-file-query
Semantic QA with a markdown database: Query any markdown file using vector embedding, Pinecone vector database and GPT (langchain). A weaker version of privateGPT
-
YassQueenDB
Graph database library that allows you to store, analyze, and search through your data in a graph format. By using the Universal Sentence Encoder, it provides an efficient and semantic approach to handle text data. 📚🧠🚀
-
vektor
a mini vector database implementation that intends to be educational and interpretable (by notallm)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: LlamaIndex: A data framework for your LLM applications | news.ycombinator.com | 2024-04-07
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
In this blog post, I’ll be comparing 3 distinct AI-first code search tools I recently came across: Cody (developed by late-stage startup, Sourcegraph), SeaGOAT (an open-source project that was trending on HN last week), and Bloop (an early-stage YC startup). I’ll be evaluating them along the dimensions of user-friendliness as well as their accuracy.
Pinecone: A scalable vector database service that facilitates efficient similarity search in high-dimensional spaces. Ideal for building real-time applications in AI, such as personalized recommendation engines and content-based retrieval systems.
Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...
Project mention: Show HN: LLMFlows – LangChain alternative for explicit and transparent apps | news.ycombinator.com | 2023-07-29
Project mention: Show HN: Chromem-go – Embeddable vector database for Go | news.ycombinator.com | 2024-04-05Qdrant lib project https://github.com/tyrchen/qdrant-lib, Qdrant SDK has also support for local mode, which means embeddable https://github.com/qdrant/qdrant-client
Project mention: Legalyze – AI for Lawyers to Query Case Files | news.ycombinator.com | 2023-05-21We have built Legalyze.ai, a tool for lawyers to query thousands of files at once. We are using Langchain in coordination with GPT-4 and Pinecone to query massive sets of data at once.
Lawyers can also generate procedural documents like motions and requests using their case as context.
Contact [email protected] for a trial and check out our open source project - https://github.com/Haste171/langchain-chatbot
Qdrant’s benchmark results are strongly in favor of accuracy and efficiency. We recommend that you consider them before deciding that an LLM is enough. Take a look at our open-source benchmark reports and try out the tests yourself.
Project mention: Show HN: ChatData – an open-source ChatGPT-like chatbot | news.ycombinator.com | 2023-11-28Hey there, wonderful Hacker News community! We're excited to share something special with you - ChatData. This isn't just another chat-with-documents app; it's a game-changer that melds MyScale and LangChain, empowering you to query millions of files effortlessly.
ChatData redefines the conversation between you and knowledge. Explore the MyScale free knowledge base or delve into your uploaded documents for tailored insights and answers.
Retriever Type: Fueled by the Retrieval Augmented Generation (RAG) framework, ChatData introduces the Self-querying retriever and VectorSQL. Build intricate queries effortlessly using LangChain, covering everything from timestamps to arrays of strings.
Session Management: Elevate your chat experience with intuitive session management. Customize your session ID, tweak prompts, and guide ChatData through your queries with ease. It's like having a personal conversation with your knowledge!
Build Your Own Knowledge Base: Beyond MyScale's external knowledge base, ChatData invites you to upload your files using the Unstructured API. Your privacy matters - only processed texts are stored. It's your knowledge, your way!
Whether you're a researcher, a student, or just someone hungry for knowledge, ChatData simplifies your journey through vast data. Unleash the true potential of information retrieval and explore a world of knowledge with a friendly touch.
We genuinely can't wait to hear your thoughts and feedback. Let's embark on this exciting journey of knowledge discovery together with ChatData (https://github.com/myscale/ChatData)!
Was really excited to get everything working! Check it out at: https://github.com/aju22/DocumentGPT
Project mention: Created a smol vector database in my free time. Looking to provide a LangChain integration soon! | /r/LangChain | 2023-05-06It supports all the basic features like creating an index, inserting vectors and searching through them. Here's the GitHub link if anyone's interested in going over it: https://github.com/0xDebabrata/citrus
One of the most interesting projects I came across this month was NeoGPT. It's a GPT based application that is being built to converse with documents and videos. While still in its infancy, the project has outlined a cool roadmap and has a very active base of contributors continuously expanding on its functionality. The project appeals to my desire to learn how to work with AI and neural networks. It is also at a development stage that it is not outside of the reach of my comprehension. Icing on the cake being it's Py based, which is my sharpest tool at the moment. I see it as a decent project to stay tapped into and grow my skills as the application develops.
Project mention: [D] ChatGPT4 doesn’t cut it for my work. Need a more accurate tool. | /r/MachineLearning | 2023-12-06We have a research-focused framework for these kinds of tasks here: https://github.com/biocypher/biochatter. Requests and contributions welcome.
Project mention: [P] I built a Chatbot to talk with any Github Repo. 🪄 | /r/MachineLearning | 2023-04-29
Project mention: Weekly Thread: What questions do you have about vector databases? | /r/vectordatabase | 2023-07-12
Python vector-database related posts
- RAG is Dead. Long Live RAG!
- 7 Vector Databases Every Developer Should Know!
- Qdrant, the Vector Search Database, raised $28M in a Series A round
- Using Vector Embeddings to Overengineer 404 pages
- Pinecone: Build Knowledgeable AI
- Vector Databases: A Technical Primer [pdf]
- FLaNK Stack Weekly 11 Dec 2023
-
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024
Index
What are some of the best open-source vector-database projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | llama_index | 30,910 |
2 | deeplake | 7,708 |
3 | txtai | 6,953 |
4 | superduperdb | 4,346 |
5 | SeaGOAT | 911 |
6 | autollm | 908 |
7 | canopy | 873 |
8 | NeumAI | 774 |
9 | llmflows | 615 |
10 | qdrant-client | 608 |
11 | vectordb | 462 |
12 | langchain-chatbot | 371 |
13 | vector-db-benchmark | 224 |
14 | ChatData | 133 |
15 | DocumentGPT | 99 |
16 | relevanceai | 97 |
17 | citrus | 92 |
18 | NeoGPT | 55 |
19 | biochatter | 40 |
20 | markdown-file-query | 25 |
21 | YassQueenDB | 14 |
22 | vektor | 12 |
23 | QDrant-NLP | 11 |
Sponsored