Python Retrieval

Open-source Python projects categorized as Retrieval

Top 18 Python Retrieval Projects

  • R2R

    Containerized, state of the art Retrieval-Augmented Generation (RAG) system with a RESTful API

    Project mention: Ask HN: Local RAG with private knowledge base | news.ycombinator.com | 2024-10-29

    You might find Rag to Riches' (R2R) built-in use of Unstructured for doc parsing, hybrid search, knowledge graphs, and HyDE queries improves the quality of your retrievals. https://github.com/SciPhi-AI/R2R

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

    Project mention: Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance! | dev.to | 2024-08-29

    The source code for these experiments is open-source and utilizes beir-qdrant, an integration of Qdrant with the BeIR library. While this package is not officially maintained by the Qdrant team, it may prove useful for those interested in experimenting with various Qdrant configurations to see how they impact retrieval quality. All experiments were conducted using Qdrant in exact search mode, ensuring the results are not influenced by approximate search.

  • fastembed

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

    Project mention: FastLLM by Qdrant – lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
  • raptor

    The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

    Project mention: Show HN: A phone number to text with questions about current events | news.ycombinator.com | 2024-05-10

    Hi HN! For my senior thesis in CS, I built an SMS-based application to make journalism more accessible. It works like this:

    1) You text the topics you're interested in to my phone number. Every day, you'll receive a text with 5 headlines from The Associated Press (https://apnews.com/) related to those topics.

    2) If you have questions about any of the current events the headlines describe, you just text them back. A response is generated from the contents of the articles using the RAPTOR retrieval framework (https://github.com/parthsarthi03/raptor) and texted right back to you.

    The repo can be found here: https://github.com/tdh15/pressText

    I'd really appreciate any and all feedback. Whatever you got, I'd love to hear it :)

  • RETRO-pytorch

    Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

  • NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  • searchGPT

    Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

  • memorizing-transformers-pytorch

    Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

    Project mention: HMT: Hierarchical Memory Transformer for Long Context Language Processing | news.ycombinator.com | 2024-05-17

    Code: https://github.com/OswaldHe/HMT-pytorch

    This looks really interesting. I've the paper to my reading list and look forward to playing with the code. I'm curious to see what kinds of improvements we can get by agumenting Transformers and other generative language/sequence models with this and other mechanisms implementing hierarchical memory.[a]

    We sure live in interesting times!

    ---

    [a] In the past, I experimented a little with transformers that had access to external memory using https://github.com/lucidrains/memorizing-transformers-pytorc... and also using routed queries with https://github.com/glassroom/heinsen_routing . Both approaches seemed to work, but I never attempted to build any kind of hierarchy with those approaches.

  • xmc.dspy

    In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.

    Project mention: Betting on DSPy for Systems of LLMs | news.ycombinator.com | 2024-08-10

    The abstractions could be cleaner. I think some of the convolution is due to the evolution that it has undergone and core contributors have not come around to being fully “out with the old”.

    I think there might be practical benefits to it. The XMC example illustrates it for me:

    https://github.com/KarelDO/xmc.dspy

  • cherche

    Neural Search

  • DALM

    Domain Adapted Language Modeling Toolkit - E2E RAG

  • ACT

    Atmospheric data Community Toolkit - A python based toolkit for exploring and analyzing time series atmospheric datasets (by ARM-DOE)

  • icl-ceil

    [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.

  • retomaton

    PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)

  • denser-chat

    Chat with PDF files with source highlights

    Project mention: PDF chat with source highlights | dev.to | 2024-11-07

    GitHub Repo: https://github.com/denser-org/denser-chat

  • ragswift

    🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform

    Project mention: Show HN: Ragswift – Scalable embeddings platform powered by distributed compute | news.ycombinator.com | 2024-01-22
  • SHREC2023-ANIMAR

    Source codes of team TikTorch (1st place solution) for track 2 and 3 of the SHREC2023 Challenge

  • FloridaPropertyData

    A Python-based tool for retrieving and processing property data for specific counties in Florida using Parcel ID numbers. Simplifies data retrieval and offers customization options for real estate agents, investors, and government officials.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Retrieval discussion

Log in or Post with

Python Retrieval related posts

  • PDF chat with source highlights

    2 projects | dev.to | 7 Nov 2024
  • Show HN: R2R V2 – A open source RAG engine with prod features

    2 projects | news.ycombinator.com | 26 Jun 2024
  • Show HN: A phone number to text with questions about current events

    2 projects | news.ycombinator.com | 10 May 2024
  • RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

    1 project | news.ycombinator.com | 30 Apr 2024
  • [D] Any pre trained retrieval based language models available?

    3 projects | /r/MachineLearning | 22 Oct 2022
  • A note from our sponsor - SaaSHub
    www.saashub.com | 7 Dec 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Retrieval projects in Python? This list will help you:

Project Stars
1 R2R 3,794
2 beir 1,643
3 fastembed 1,570
4 raptor 991
5 RETRO-pytorch 849
6 NeumAI 841
7 searchGPT 661
8 memorizing-transformers-pytorch 623
9 xmc.dspy 387
10 cherche 325
11 DALM 312
12 ACT 151
13 icl-ceil 90
14 retomaton 71
15 denser-chat 69
16 ragswift 36
17 SHREC2023-ANIMAR 7
18 FloridaPropertyData 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 2nd most popular programming language
based on number of metions?