similarity
ColBERT
similarity | ColBERT | |
---|---|---|
7 | 5 | |
1,010 | 2,954 | |
0.3% | 4.5% | |
5.9 | 7.9 | |
5 months ago | about 1 month ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
similarity
-
New free tool that uses fine-tuned BERT model to surface answers from research papers
Tensorflow Ranking and Tensorflow similarity (maybe relevant/irrelevant contrastive learning?) look like they could be useful.
-
Non-Machine Learning Image Matching with a Vector DB
There is the metric learning problem to learn a hash for similarity https://github.com/tensorflow/similarity
That said, I don't see many good models available for download on tfhub or huggingface optimized for it, but you can always programmatically modify your images (if you truly mean identical to humans) - change white balance, crop, rotate, select adjacent frames from videos, etc. and optimize a network that is small enough for you to be satisfied and see if that works, as a possible alternative.
-
Face Detection for 520 People
Metric learning has great implementations inside Tensorflow Similarity library: https://github.com/tensorflow/similarity Although the documentation is quite bad, but the jupyter notebooks are great.
-
[P] TensorFlow Similarity 0.16 is out
Just a quick note that TensorFlow Similarity 0.16 is out -- this release beside adding the XMB loss is mostly focus on refactoring and optimizing the core components to ensure everything works smoothly and accurately. Details are in the changelog as usual and a simple pip install -U tensorflow_similarity should just work.
- Self-supervised learning added to TensorFlow Similarity
-
[P] TensorFlow Similarity now self-supervised training
Very happy to announce that as part of the 0.15 release, TensorFlow Similarity now support self-supervised learning using STOA algorithms. To help you get started we included in the release a detailed getting started notebook that you can run in Colab. This notebook shows you how to use SimSiam self-supervised pre-training to almost double the accuracy compared to a model trained from scratch on CIFAR 10.
-
TensorFlow Introduces ‘TensorFlow Similarity’, An Easy And Fast Python Package To Train Similarity Models Using TensorFlow
Github: https://github.com/tensorflow/similarity
ColBERT
-
ColBERT Live! Makes Your Vector Database Smarter
But for production usage, the only option until now has been the Stanford ColBERT library and the Ragatouille wrapper. These are high performance libraries, but they only support use cases that can fit in a two-stage pipeline of (1) ingest all your data and then (2) search it. Updating indexed data is not supported, and integrating with other data your application cares about (such as ACLs) or even other parts of the indexed data (creation date, author, etc) is firmly in roll-your-own territory.
-
Why Vector Compression Matters
I’ll conclude by explaining how vector compression relates to ColBERT, a higher-level technique that Astra DB customers are starting to use successfully.
-
How ColBERT Helps Developers Overcome the Limits of Retrieval-Augmented Generation
ColBERT is a new way of scoring passage relevance using a BERT language model that substantially solves the problems with DPR. This diagram from the first ColBERT paper shows why it’s so exciting:
- FLaNK Stack 05 Feb 2024
-
New free tool that uses fine-tuned BERT model to surface answers from research papers
ColBERT and successors for retrieval.
What are some alternatives?
pytorch-metric-learning - The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
pgANN - Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.
elasticsearch-learning-to-rank - Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch
ContraD - Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)
history_rag
quaterion - Blazing fast framework for fine-tuning similarity learning models
Milvus - A cloud-native vector database, storage for next generation AI applications
finetuner - :dart: Task-oriented embedding tuning for BERT, CLIP, etc.
awesome-semantic-search - A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.
Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
MoE-LLaVA - Mixture-of-Experts for Large Vision-Language Models