Python Retrieval

Open-source Python projects categorized as Retrieval

Top 16 Python Retrieval Projects

  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

  • Project mention: On building a semantic search engine | news.ycombinator.com | 2024-01-06

    The BEIR project might be what you're looking for: https://github.com/beir-cellar/beir/wiki/Leaderboard

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • R2R

    The framework for fast development and deployment of RAG systems. (by SciPhi-AI)

  • Project mention: Show HN: R2R – Open-source framework for production-grade RAG | news.ycombinator.com | 2024-02-26
  • RETRO-pytorch

    Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

  • fastembed

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

  • Project mention: FastLLM by Qdrant – lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
  • NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  • Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21

    Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...

  • memorizing-transformers-pytorch

    Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

  • Project mention: What can LLMs never do? | news.ycombinator.com | 2024-04-27

    At one point I experimented a little with transformers that had access to external memory searchable via KNN lookups https://github.com/lucidrains/memorizing-transformers-pytorc... or via routed queries with https://github.com/glassroom/heinsen_routing . Both approaches seemed to work for me, but I had to put that work on hold for reasons outside my control.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • searchGPT

    Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

  • raptor

    The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

  • Project mention: RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation | news.ycombinator.com | 2024-04-30

    Worth a comparison with RAPTOR, another tiered RAG system.

    https://arxiv.org/abs/2401.18059

  • cherche

    Neural Search

  • Project mention: [P] Semantic search | /r/MachineLearning | 2023-05-08

    If you are interested, you can check out the documentation here: https://github.com/raphaelsty/cherche

  • ACT

    Atmospheric data Community Toolkit - A python based toolkit for exploring and analyzing time series atmospheric datasets (by ARM-DOE)

  • icl-ceil

    [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.

  • retomaton

    PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)

  • ragswift

    🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform

  • Project mention: Show HN: Ragswift – Scalable embeddings platform powered by distributed compute | news.ycombinator.com | 2024-01-22
  • SHREC2023-ANIMAR

    Source codes of team TikTorch (1st place solution) for track 2 and 3 of the SHREC2023 Challenge

  • FloridaPropertyData

    A Python-based tool for retrieving and processing property data for specific counties in Florida using Parcel ID numbers. Simplifies data retrieval and offers customization options for real estate agents, investors, and government officials.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Retrieval related posts

  • RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

    1 project | news.ycombinator.com | 30 Apr 2024
  • [D] Any pre trained retrieval based language models available?

    3 projects | /r/MachineLearning | 22 Oct 2022

Index

What are some of the best open-source Retrieval projects in Python? This list will help you:

Project Stars
1 mteb 1,395
2 beir 1,388
3 R2R 1,202
4 RETRO-pytorch 827
5 fastembed 796
6 NeumAI 779
7 memorizing-transformers-pytorch 611
8 searchGPT 570
9 raptor 450
10 cherche 313
11 ACT 126
12 icl-ceil 81
13 retomaton 64
14 ragswift 33
15 SHREC2023-ANIMAR 6
16 FloridaPropertyData 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com