InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python rag Projects
-
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
⭐️ RAG Flow on GitHub
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
awesome-llm-apps
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
-
Resource: LlamaIndex Documentation
-
quivr
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
-
chatgpt-on-wechat
基于大模型搭建的聊天机器人,同时支持 微信公众号、企业微信应用、飞书、钉钉 等接入,可选择ChatGPT/Claude/DeepSeek/文心一言/讯飞星火/通义千问/ Gemini/GLM-4/Kimi/LinkAI,能处理文本、语音和图片,访问操作系统和互联网,支持基于自有知识库进行定制企业智能客服。
-
Project mention: Show HN: How to make your MCP clients more context-aware | news.ycombinator.com | 2025-05-13
-
MindsDB
AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need
Customer support is the backbone of any successful business. In today's digital landscape, leveraging artificial intelligence (AI) to automate and enhance support experiences can set your product apart. In this article, we'll explore how to build a customer support application powered by MindsDB, an open-source AI platform that makes it easy to integrate machine learning into your apps.
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
khoj
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
GitHub https://github.com/khoj-ai/khoj GitHub Star 12.4k GitHub Fork 627 GitHub Issue 64 GitHub Pull Request 3 GitHub Contributor 35 Open Source License AGPL-3.0 Official Website https://khoj.dev/ Documentation https://docs.khoj.dev/
-
-
Project mention: Kotaemon-papers: an open-source web app to chat with your academic papers | news.ycombinator.com | 2025-01-05
Hi HN,
Our team at https://github.com/Cinnamon/kotaemon/ has been working on a public demo to showcase the new advanced citation features in our RAG (retrieval-augmented generation) application.
We’re excited to share a web app that lets users explore top daily machine learning (ML) papers on Arxiv (via the HuggingFace API) and upload their own Arxiv papers to get LLM-assisted summaries, mind maps, and answers to questions based on the content.
Some notable features:
- Instant Summaries & Mind Maps: Generate concise summaries and visual mind maps for any Arxiv paper.
- Transparent Citations: Verify AI-generated answers with clear, evidence-backed citations. Citations are highlighted directly in the in-browser PDF viewer.
- Flexible Citation Options: Choose between highlights and inline citations. Plus, click on any sentence in the AI-generated response to see its supporting source from the original paper.
- Multi-Paper Analysis: Compare, contrast, and compose summaries from multiple papers simultaneously.
- Complex Question Solving: Use Chain-of-Thought (CoT) reasoning mode to break down and solve complex questions step-by-step.
- Customizable & Private Hosting: Easily self-host or customize your private app via HuggingFace Spaces. You can securely connect your LLM and upload your own document collections.
We’d love to hear your thoughts, feedback, and recommendations as we continue improving this tool.
Check out the demo here and happy hacking!
-
haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Project mention: Building a Prompt-Based Crypto Trading Platform with RAG and Reddit Sentiment Analysis using Haystack | dev.to | 2025-04-28Haystack forms the backbone of our RAG system. It provides pipelines for processing documents, embedding text, and retrieving relevant information.
-
-
Essentially this solution is to let the AI *formulate the search* expression and not do the search itself (similar to the concept of generating a SQL statement instead of executing it https://github.com/vanna-ai/vanna).
-
Project mention: Making Sense of Congressional Data with LightRAG, Amazon Bedrock, and Ollama | dev.to | 2025-03-22
LightRAG enhances RAG systems by integrating graph structures into text indexing and retrieval processes. In simple terms, it better connects related pieces of information, giving more accurate and quick answers. By combining graph relationships with vector-based retrieval, LightRAG pulls in context from both low-level details and high-level insights. An incremental update algorithm ensures your data stays fresh, making it a great choice when data is continuously evolving.
-
DB-GPT
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
-
Project mention: Show HN: Open-source Deep Research across workplace applications | news.ycombinator.com | 2025-03-03
-
Project mention: LangGraph + Graphiti + Long Term Memory = Powerful Agentic Memory | dev.to | 2025-06-02
-
txtai
💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
-
memvid
Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.
Project mention: Ragged – Leveraging Video Container Formats for Efficient Vector DB Distribution | news.ycombinator.com | 2025-06-28- An open-source implementation to facilitate reproduction and adoption.
I was inspired by the innovative work of Memvid (https://github.com/Olow304/memvid), which demonstrated the potential of using video formats for data storage. My project builds on this concept with a focus on CDNs and semantic search.
I believe Ragged offers a promising solution for deploying semantic search capabilities in edge computing and serverless environments, leveraging the mature video distribution ecosystem. Also sharing indexed knowledge bases in the form of offline MP4 can unlock a new class of applications.
I'm eager to hear your thoughts, feedback, and any potential use cases you envision for this approach. You can find the full paper and implementation details [here](https://github.com/nikitph/ragged).
Thank you for your time fellows
Nikit
-
Project mention: Show HN: Reliability layer to prevent LLM hallucinations | news.ycombinator.com | 2025-02-24
-
Project mention: Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs | news.ycombinator.com | 2025-06-18
https://github.com/Future-House/paper-qa?tab=readme-ov-file#... :
> PaperQA2 is engineered to be the best agentic RAG model for working with scientific papers.
> [ Semantic Scholar, CrossRef, ]
paperqa-zotero: https://github.com/lejacobroy/paperqa-zotero
The Oracle of Zotero is a fork of paperqa-zotero fork FAISS and langchain:
-
R2R
SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.
Project mention: Show HN: Toller – A Python library for robust async calls | news.ycombinator.com | 2025-05-13I built this after a painful incident with one of my R2R (https://github.com/SciPhi-AI/R2R) clients where Azure OpenAI went down unexpectedly. While we were technically propagating errors correctly, we lacked clean, accessible error patterns that would allow the client to implement proper mitigation strategies. They were fully dependent on our infrastructure to handle the outage, with no way to gracefully degrade or implement custom fallbacks.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python rag discussion
Python rag related posts
-
Looking for Senior Frontend Engineer (React/Vue)
-
Ask HN: What useful AI tools do you use every day?
-
Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs
-
Collection of LLM Apps
-
A practical guide to building agents [pdf]
-
LangGraph + Graphiti + Long Term Memory = Powerful Agentic Memory
-
LlamaIndex File Chat Workflow with A2A Protocol
-
A note from our sponsor - InfluxDB
www.influxdata.com | 12 Jul 2025
Index
What are some of the best open-source rag projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | ragflow | 59,172 |
2 | awesome-llm-apps | 48,646 |
3 | llama_index | 42,912 |
4 | quivr | 38,095 |
5 | chatgpt-on-wechat | 38,037 |
6 | mem0 | 36,188 |
7 | MindsDB | 33,334 |
8 | khoj | 30,523 |
9 | graphrag | 26,364 |
10 | kotaemon | 22,767 |
11 | haystack | 21,477 |
12 | Scrapegraph-ai | 20,224 |
13 | vanna | 18,531 |
14 | LightRAG | 18,241 |
15 | DB-GPT | 16,910 |
16 | onyx | 13,127 |
17 | graphiti | 12,240 |
18 | txtai | 11,166 |
19 | memvid | 8,162 |
20 | Upsonic | 7,568 |
21 | paper-qa | 7,543 |
22 | R2R | 7,036 |
23 | rags | 6,474 |