SaaSHub helps you find the best software and product alternatives Learn more →
Top 18 Python Retrieval Projects
-
You might find Rag to Riches' (R2R) built-in use of Unstructured for doc parsing, hybrid search, knowledge graphs, and HyDE queries improves the quality of your retrievals. https://github.com/SciPhi-AI/R2R
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Project mention: Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance! | dev.to | 2024-08-29The source code for these experiments is open-source and utilizes beir-qdrant, an integration of Qdrant with the BeIR library. While this package is not officially maintained by the Qdrant team, it may prove useful for those interested in experimenting with various Qdrant configurations to see how they impact retrieval quality. All experiments were conducted using Qdrant in exact search mode, ensuring the results are not influenced by approximate search.
-
Project mention: FastLLM by Qdrant – lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
-
raptor
The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Project mention: Show HN: A phone number to text with questions about current events | news.ycombinator.com | 2024-05-10Hi HN! For my senior thesis in CS, I built an SMS-based application to make journalism more accessible. It works like this:
1) You text the topics you're interested in to my phone number. Every day, you'll receive a text with 5 headlines from The Associated Press (https://apnews.com/) related to those topics.
2) If you have questions about any of the current events the headlines describe, you just text them back. A response is generated from the contents of the articles using the RAPTOR retrieval framework (https://github.com/parthsarthi03/raptor) and texted right back to you.
The repo can be found here: https://github.com/tdh15/pressText
I'd really appreciate any and all feedback. Whatever you got, I'd love to hear it :)
-
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
searchGPT
Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
-
memorizing-transformers-pytorch
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch
Project mention: HMT: Hierarchical Memory Transformer for Long Context Language Processing | news.ycombinator.com | 2024-05-17Code: https://github.com/OswaldHe/HMT-pytorch
This looks really interesting. I've the paper to my reading list and look forward to playing with the code. I'm curious to see what kinds of improvements we can get by agumenting Transformers and other generative language/sequence models with this and other mechanisms implementing hierarchical memory.[a]
We sure live in interesting times!
---
[a] In the past, I experimented a little with transformers that had access to external memory using https://github.com/lucidrains/memorizing-transformers-pytorc... and also using routed queries with https://github.com/glassroom/heinsen_routing . Both approaches seemed to work, but I never attempted to build any kind of hierarchy with those approaches.
-
xmc.dspy
In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.
The abstractions could be cleaner. I think some of the convolution is due to the evolution that it has undergone and core contributors have not come around to being fully “out with the old”.
I think there might be practical benefits to it. The XMC example illustrates it for me:
https://github.com/KarelDO/xmc.dspy
-
-
-
ACT
Atmospheric data Community Toolkit - A python based toolkit for exploring and analyzing time series atmospheric datasets (by ARM-DOE)
-
-
retomaton
PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)
-
GitHub Repo: https://github.com/denser-org/denser-chat
-
ragswift
🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform
Project mention: Show HN: Ragswift – Scalable embeddings platform powered by distributed compute | news.ycombinator.com | 2024-01-22 -
SHREC2023-ANIMAR
Source codes of team TikTorch (1st place solution) for track 2 and 3 of the SHREC2023 Challenge
-
FloridaPropertyData
A Python-based tool for retrieving and processing property data for specific counties in Florida using Parcel ID numbers. Simplifies data retrieval and offers customization options for real estate agents, investors, and government officials.
Python Retrieval discussion
Python Retrieval related posts
-
PDF chat with source highlights
-
Show HN: R2R V2 – A open source RAG engine with prod features
-
Show HN: A phone number to text with questions about current events
-
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
-
[D] Any pre trained retrieval based language models available?
-
A note from our sponsor - SaaSHub
www.saashub.com | 7 Dec 2024
Index
What are some of the best open-source Retrieval projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | R2R | 3,794 |
2 | beir | 1,643 |
3 | fastembed | 1,570 |
4 | raptor | 991 |
5 | RETRO-pytorch | 849 |
6 | NeumAI | 841 |
7 | searchGPT | 661 |
8 | memorizing-transformers-pytorch | 623 |
9 | xmc.dspy | 387 |
10 | cherche | 325 |
11 | DALM | 312 |
12 | ACT | 151 |
13 | icl-ceil | 90 |
14 | retomaton | 71 |
15 | denser-chat | 69 |
16 | ragswift | 36 |
17 | SHREC2023-ANIMAR | 7 |
18 | FloridaPropertyData | 3 |