Top 6 sbert Open-Source Projects
-
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
AnnA_Anki_neuronal_Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
-
SBERT-for-Question-Answering-on-COVID-19-Dataset
Sentence Bert for Question-Answering on COVID-19 Open Research Dataset (CORD-19)
RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
The BEIR project might be what you're looking for: https://github.com/beir-cellar/beir/wiki/Leaderboard
sbert related posts
Index
What are some of the best open-source sbert projects? This list will help you:
Project | Stars | |
---|---|---|
1 | mteb | 1,448 |
2 | beir | 1,407 |
3 | bert-solr-search | 161 |
4 | targetedSummarization | 87 |
5 | AnnA_Anki_neuronal_Appendix | 57 |
6 | SBERT-for-Question-Answering-on-COVID-19-Dataset | 3 |
Sponsored