Python rag

Open-source Python projects categorized as rag

Top 23 Python rag Projects

  1. ragflow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

    Project mention: 7 AI Open Source Libraries To Build RAG, Agents & AI Search | dev.to | 2024-11-14

    ⭐️ RAG Flow on GitHub

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. awesome-llm-apps

    Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

    Project mention: Collection of LLM Apps | news.ycombinator.com | 2025-06-12
  4. llama_index

    LlamaIndex is the leading framework for building LLM-powered agents over your data.

    Project mention: Complete Large Language Model (LLM) Learning Roadmap | dev.to | 2025-04-11

    Resource: LlamaIndex Documentation

  5. quivr

    Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.

    Project mention: Ask HN: Local RAG with private knowledge base | news.ycombinator.com | 2024-10-29
  6. chatgpt-on-wechat

    基于大模型搭建的聊天机器人,同时支持 微信公众号、企业微信应用、飞书、钉钉 等接入,可选择ChatGPT/Claude/DeepSeek/文心一言/讯飞星火/通义千问/ Gemini/GLM-4/Kimi/LinkAI,能处理文本、语音和图片,访问操作系统和互联网,支持基于自有知识库进行定制企业智能客服。

  7. mem0

    Memory for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

    Project mention: Show HN: How to make your MCP clients more context-aware | news.ycombinator.com | 2025-05-13
  8. MindsDB

    AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need

    Project mention: Building an AI-Powered Customer Support App Using MindsDB | dev.to | 2025-06-30

    Customer support is the backbone of any successful business. In today's digital landscape, leveraging artificial intelligence (AI) to automate and enhance support experiences can set your product apart. In this article, we'll explore how to build a customer support application powered by MindsDB, an open-source AI platform that makes it easy to integrate machine learning into your apps.

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. khoj

    Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

    Project mention: Top 13 Self-Hosted Projects with the Most GitHub Stars | dev.to | 2024-09-10

    GitHub https://github.com/khoj-ai/khoj GitHub Star 12.4k GitHub Fork 627 GitHub Issue 64 GitHub Pull Request 3 GitHub Contributor 35 Open Source License AGPL-3.0 Official Website https://khoj.dev/ Documentation https://docs.khoj.dev/

  11. graphrag

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    Project mention: A practical guide to building agents [pdf] | news.ycombinator.com | 2025-06-04
  12. kotaemon

    An open-source RAG-based tool for chatting with your documents.

    Project mention: Kotaemon-papers: an open-source web app to chat with your academic papers | news.ycombinator.com | 2025-01-05

    Hi HN,

    Our team at https://github.com/Cinnamon/kotaemon/ has been working on a public demo to showcase the new advanced citation features in our RAG (retrieval-augmented generation) application.

    We’re excited to share a web app that lets users explore top daily machine learning (ML) papers on Arxiv (via the HuggingFace API) and upload their own Arxiv papers to get LLM-assisted summaries, mind maps, and answers to questions based on the content.

    Some notable features:

    - Instant Summaries & Mind Maps: Generate concise summaries and visual mind maps for any Arxiv paper.

    - Transparent Citations: Verify AI-generated answers with clear, evidence-backed citations. Citations are highlighted directly in the in-browser PDF viewer.

    - Flexible Citation Options: Choose between highlights and inline citations. Plus, click on any sentence in the AI-generated response to see its supporting source from the original paper.

    - Multi-Paper Analysis: Compare, contrast, and compose summaries from multiple papers simultaneously.

    - Complex Question Solving: Use Chain-of-Thought (CoT) reasoning mode to break down and solve complex questions step-by-step.

    - Customizable & Private Hosting: Easily self-host or customize your private app via HuggingFace Spaces. You can securely connect your LLM and upload your own document collections.

    We’d love to hear your thoughts, feedback, and recommendations as we continue improving this tool.

    Check out the demo here and happy hacking!

  13. haystack

    AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

    Project mention: Building a Prompt-Based Crypto Trading Platform with RAG and Reddit Sentiment Analysis using Haystack | dev.to | 2025-04-28

    Haystack forms the backbone of our RAG system. It provides pipelines for processing documents, embedding text, and retrieving relevant information.

  14. Scrapegraph-ai

    Python scraper based on AI

    Project mention: ScrapeGraphAI Release Week | news.ycombinator.com | 2025-07-07
  15. vanna

    🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

    Project mention: Supercharging Obsidian Search with AI and Ollama | dev.to | 2024-11-26

    Essentially this solution is to let the AI *formulate the search* expression and not do the search itself (similar to the concept of generating a SQL statement instead of executing it https://github.com/vanna-ai/vanna).

  16. LightRAG

    "LightRAG: Simple and Fast Retrieval-Augmented Generation"

    Project mention: Making Sense of Congressional Data with LightRAG, Amazon Bedrock, and Ollama | dev.to | 2025-03-22

    LightRAG enhances RAG systems by integrating graph structures into text indexing and retrieval processes. In simple terms, it better connects related pieces of information, giving more accurate and quick answers. By combining graph relationships with vector-based retrieval, LightRAG pulls in context from both low-level details and high-level insights. An incremental update algorithm ensures your data stays fresh, making it a great choice when data is continuously evolving.

  17. DB-GPT

    AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

  18. onyx

    Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

    Project mention: Show HN: Open-source Deep Research across workplace applications | news.ycombinator.com | 2025-03-03
  19. graphiti

    Build Real-Time Knowledge Graphs for AI Agents

    Project mention: LangGraph + Graphiti + Long Term Memory = Powerful Agentic Memory | dev.to | 2025-06-02
  20. txtai

    💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

    Project mention: Chunking your data for RAG | dev.to | 2025-02-11
  21. memvid

    Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.

    Project mention: Ragged – Leveraging Video Container Formats for Efficient Vector DB Distribution | news.ycombinator.com | 2025-06-28

    - An open-source implementation to facilitate reproduction and adoption.

    I was inspired by the innovative work of Memvid (https://github.com/Olow304/memvid), which demonstrated the potential of using video formats for data storage. My project builds on this concept with a focus on CDNs and semantic search.

    I believe Ragged offers a promising solution for deploying semantic search capabilities in edge computing and serverless environments, leveraging the mature video distribution ecosystem. Also sharing indexed knowledge bases in the form of offline MP4 can unlock a new class of applications.

    I'm eager to hear your thoughts, feedback, and any potential use cases you envision for this approach. You can find the full paper and implementation details [here](https://github.com/nikitph/ragged).

    Thank you for your time fellows

    Nikit

  22. Upsonic

    The most reliable AI agent framework that supports MCP.

    Project mention: Show HN: Reliability layer to prevent LLM hallucinations | news.ycombinator.com | 2025-02-24
  23. paper-qa

    High accuracy RAG for answering questions from scientific documents with citations

    Project mention: Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs | news.ycombinator.com | 2025-06-18

    https://github.com/Future-House/paper-qa?tab=readme-ov-file#... :

    > PaperQA2 is engineered to be the best agentic RAG model for working with scientific papers.

    > [ Semantic Scholar, CrossRef, ]

    paperqa-zotero: https://github.com/lejacobroy/paperqa-zotero

    The Oracle of Zotero is a fork of paperqa-zotero fork FAISS and langchain:

  24. R2R

    SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

    Project mention: Show HN: Toller – A Python library for robust async calls | news.ycombinator.com | 2025-05-13

    I built this after a painful incident with one of my R2R (https://github.com/SciPhi-AI/R2R) clients where Azure OpenAI went down unexpectedly. While we were technically propagating errors correctly, we lacked clean, accessible error patterns that would allow the client to implement proper mitigation strategies. They were fully dependent on our infrastructure to handle the outage, with no way to gracefully degrade or implement custom fallbacks.

  25. rags

    Build ChatGPT over your data, all with natural language

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python rag discussion

Log in or Post with

Python rag related posts

  • Looking for Senior Frontend Engineer (React/Vue)

    1 project | dev.to | 8 Jul 2025
  • Ask HN: What useful AI tools do you use every day?

    6 projects | news.ycombinator.com | 25 Jun 2025
  • Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs

    5 projects | news.ycombinator.com | 18 Jun 2025
  • Collection of LLM Apps

    1 project | news.ycombinator.com | 12 Jun 2025
  • A practical guide to building agents [pdf]

    4 projects | news.ycombinator.com | 4 Jun 2025
  • LangGraph + Graphiti + Long Term Memory = Powerful Agentic Memory

    1 project | dev.to | 2 Jun 2025
  • LlamaIndex File Chat Workflow with A2A Protocol

    4 projects | dev.to | 2 Jun 2025
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 12 Jul 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source rag projects in Python? This list will help you:

# Project Stars
1 ragflow 59,172
2 awesome-llm-apps 48,646
3 llama_index 42,912
4 quivr 38,095
5 chatgpt-on-wechat 38,037
6 mem0 36,188
7 MindsDB 33,334
8 khoj 30,523
9 graphrag 26,364
10 kotaemon 22,767
11 haystack 21,477
12 Scrapegraph-ai 20,224
13 vanna 18,531
14 LightRAG 18,241
15 DB-GPT 16,910
16 onyx 13,127
17 graphiti 12,240
18 txtai 11,166
19 memvid 8,162
20 Upsonic 7,568
21 paper-qa 7,543
22 R2R 7,036
23 rags 6,474

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com