ColBERT
setfit
ColBERT | setfit | |
---|---|---|
4 | 13 | |
2,524 | 2,014 | |
7.0% | 5.5% | |
8.4 | 9.2 | |
about 1 month ago | 20 days ago | |
Python | Jupyter Notebook | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ColBERT
-
Why Vector Compression Matters
I’ll conclude by explaining how vector compression relates to ColBERT, a higher-level technique that Astra DB customers are starting to use successfully.
-
How ColBERT Helps Developers Overcome the Limits of Retrieval-Augmented Generation
ColBERT is a new way of scoring passage relevance using a BERT language model that substantially solves the problems with DPR. This diagram from the first ColBERT paper shows why it’s so exciting:
- FLaNK Stack 05 Feb 2024
-
New free tool that uses fine-tuned BERT model to surface answers from research papers
ColBERT and successors for retrieval.
setfit
- FLaNK Stack 05 Feb 2024
- Smarter Summaries with Finetuning GPT-3.5 and Chain of Density
-
[Discussion] Convince me that this training set contamination is fine (or not)
It did, sorry for the hasty edits! I removed that part b/c I realized that there isn't a compelling-enough reason for me to believe that text similarity is clearly inappropriate. In fact, you can train the Pr(condition | chat) classifier I suggested above using similarity training! Use SetFit for that. In the end you'll get a classifier and a similarity model.
-
Ask HN: What's the best framework for text classification (few-shot learning)?
[3] https://github.com/huggingface/setfit
-
Is it worth using LLMs like GPT-3 for text classification?
There's also kinda related approaches like SetFit which calculate embeddings from pretrained transformer models then then fit a classifier on top of the embeddings. I've yet to try it but it supposedly works well with very few labelled examples.
- LLMs for Text Classification (7B parameters)
- GPT-3 vs GPT-Neo / GPT-J for startup classification
-
Ideas on how to improve classification and scoring using Mean Pooled Sentence Embeddings
You could have a look at setfit.
-
SetFit (Sentence Transformer Fine-tuning) - Fewshot Learning without prompts [D]
Found relevant code at https://github.com/huggingface/setfit + all code implementations here
-
Most Popular AI Research Sept 2022 - Ranked Based On Total GitHub Stars
Efficient Few-Shot Learning Without Prompts https://github.com/huggingface/setfit https://arxiv.org/abs/2209.11055v1
What are some alternatives?
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
iris - Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%.
similarity - TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
elasticsearch-learning-to-rank - Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch
VToonify - [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
Milvus - A cloud-native vector database, storage for next generation AI applications
motion-diffusion-model - The official PyTorch implementation of the paper "Human Motion Diffusion Model"
haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
git-re-basin - Code release for "Git Re-Basin: Merging Models modulo Permutation Symmetries"
awesome-semantic-search - A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.
storydalle