mteb
cherche
mteb | cherche | |
---|---|---|
2 | 12 | |
1,473 | 313 | |
19.3% | - | |
9.8 | 4.4 | |
3 days ago | about 1 month ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mteb
-
AI for AWS Documentation
RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
- Text Embedding Benchmark (MTEB) Leaderboard
cherche
-
[P] Semantic search
If you are interested, you can check out the documentation here: https://github.com/raphaelsty/cherche
- Minimalist semantic search with Cherche 2.0
-
[D] is it time to investigate retrieval language models?
Here is a tool I made to create retriever-reader pipeline in a minute: Cherche, would recommend also Haystack on github !
- [P] Cherche - allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.
- Cherche - allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.
- GitHub - raphaelsty/cherche: Neural search
-
[P] Library for end-to-end neural search pipelines
Github link Documentation Hackernews link
-
Hacker News top posts: Jan 10, 2022
Neural Search for medium sized corpora\ (3 comments)
-
Neural search library in Python for medium-sized corpora
https://github.com/raphaelsty/cherche
Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. Cherche's main strength is its ability to build diverse and end-to-end pipelines.
- Neural Search for medium sized corpora
What are some alternatives?
BriefGPT - Locally hosted tool that connects documents to LLMs for summarization and querying, with a simple GUI.
NetShears - iOS Network monitor/interceptor framework
awesome-ml - Curated list of useful LLM / Analytics / Datascience resources
primeqa - The prime repository for state-of-the-art Multilingual Question Answering research and development.
anything-llm - The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.
flashtext - Extract Keywords from sentence or Replace keywords in sentences.
SHREC2023-ANIMAR - Source codes of team TikTorch (1st place solution) for track 2 and 3 of the SHREC2023 Challenge
gpl - Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
beir - A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
mindflow - 🧠AI-powered CLI git wrapper, boilerplate code generator, chat history manager, and code search engine to streamline your dev workflow 🌊
jovian-genai-hackathon
oneline - Read a text file, one line at a time