AnglE
mteb
AnglE | mteb | |
---|---|---|
12 | 2 | |
355 | 1,421 | |
- | 16.4% | |
9.2 | 9.8 | |
about 1 month ago | 1 day ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
AnglE
- FLaNK Stack Weekly 22 January 2024
- Show HN: Sentence Embedding for Vector Search
- UAE: New Sentence Embeddings for RAG | SOTA on MTEB Leaderboard
- [P]UAE: New Sentence Embeddings for RAG | SOTA on MTEB Leaderboard
- [P] UAE: New Sentence Embeddings for RAG | SOTA on MTEB Leaderboard
- Show HN: SOTA Sentence Embeddings on MTEB Leaderboard
mteb
-
AI for AWS Documentation
RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
- Text Embedding Benchmark (MTEB) Leaderboard
What are some alternatives?
Finetune_LLMs - Repo for fine-tuning Casual LLMs
BriefGPT - Locally hosted tool that connects documents to LLMs for summarization and querying, with a simple GUI.
code-llama-for-vscode - Use Code Llama with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.
awesome-ml - Curated list of useful LLM / Analytics / Datascience resources
api-for-open-llm - Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
anything-llm - The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.
instructor-embedding - [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
SHREC2023-ANIMAR - Source codes of team TikTorch (1st place solution) for track 2 and 3 of the SHREC2023 Challenge
llmware - Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.
beir - A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
jovian-genai-hackathon
marqo - Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai