ColBERT
similarity
ColBERT | similarity | |
---|---|---|
4 | 7 | |
2,524 | 998 | |
7.0% | 0.4% | |
8.4 | 5.9 | |
about 1 month ago | 15 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ColBERT
-
Why Vector Compression Matters
I’ll conclude by explaining how vector compression relates to ColBERT, a higher-level technique that Astra DB customers are starting to use successfully.
-
How ColBERT Helps Developers Overcome the Limits of Retrieval-Augmented Generation
ColBERT is a new way of scoring passage relevance using a BERT language model that substantially solves the problems with DPR. This diagram from the first ColBERT paper shows why it’s so exciting:
- FLaNK Stack 05 Feb 2024
-
New free tool that uses fine-tuned BERT model to surface answers from research papers
ColBERT and successors for retrieval.
similarity
-
New free tool that uses fine-tuned BERT model to surface answers from research papers
Tensorflow Ranking and Tensorflow similarity (maybe relevant/irrelevant contrastive learning?) look like they could be useful.
-
Non-Machine Learning Image Matching with a Vector DB
There is the metric learning problem to learn a hash for similarity https://github.com/tensorflow/similarity
That said, I don't see many good models available for download on tfhub or huggingface optimized for it, but you can always programmatically modify your images (if you truly mean identical to humans) - change white balance, crop, rotate, select adjacent frames from videos, etc. and optimize a network that is small enough for you to be satisfied and see if that works, as a possible alternative.
-
Face Detection for 520 People
Metric learning has great implementations inside Tensorflow Similarity library: https://github.com/tensorflow/similarity Although the documentation is quite bad, but the jupyter notebooks are great.
-
[P] TensorFlow Similarity 0.16 is out
Just a quick note that TensorFlow Similarity 0.16 is out -- this release beside adding the XMB loss is mostly focus on refactoring and optimizing the core components to ensure everything works smoothly and accurately. Details are in the changelog as usual and a simple pip install -U tensorflow_similarity should just work.
- Self-supervised learning added to TensorFlow Similarity
-
[P] TensorFlow Similarity now self-supervised training
Very happy to announce that as part of the 0.15 release, TensorFlow Similarity now support self-supervised learning using STOA algorithms. To help you get started we included in the release a detailed getting started notebook that you can run in Colab. This notebook shows you how to use SimSiam self-supervised pre-training to almost double the accuracy compared to a model trained from scratch on CIFAR 10.
-
TensorFlow Introduces ‘TensorFlow Similarity’, An Easy And Fast Python Package To Train Similarity Models Using TensorFlow
Github: https://github.com/tensorflow/similarity
What are some alternatives?
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
pytorch-metric-learning - The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
elasticsearch-learning-to-rank - Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch
pgANN - Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.
Milvus - A cloud-native vector database, storage for next generation AI applications
quaterion - Blazing fast framework for fine-tuning similarity learning models
haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
ContraD - Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)
awesome-semantic-search - A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.
Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
history_rag
sparse_dot_topn - Python package to accelerate the sparse matrix multiplication and top-n similarity selection