vectordb
DBoW2
vectordb | DBoW2 | |
---|---|---|
6 | 2 | |
552 | 824 | |
5.1% | - | |
7.6 | 0.0 | |
1 day ago | over 2 years ago | |
Python | C++ | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
vectordb
-
VectorDB: Vector Database Built by Kagi Search
We needed a low latency, on premise solution that we can run on edge nodes (so lightweight) with sane defaults that anyone in the team can whim in a sec.
Result is this and we constantly benchmark performance of different embeddings to ensure best defaults.
[1] https://github.com/kagisearch/vectordb#embeddings-performanc...
-
Embeddings: What they are and why they matter
If you are looking for lightweight, low- latency, fully local, end-to-end solution (chunking, embedding, storage and vector search), try vectordb [1]
Just spent a day updating it with latest benchmarks for text embedding models.
[1] https://github.com/kagisearch/vectordb
DBoW2
-
Embeddings: What they are and why they matter
Not quite the same application, but in computer vision and visual SLAM algorithms (to construct a map of your surrounding using a camera) embedding have become the de-facto algorithm to perform place-recognition ! And it's very similar to this article. It is called "bag-of-word place recognition" and it really became the standard, used by absolutely every open-source library nowadays.
The core idea is that each image is passed through a feature-extractor-descriptor pipeline and is 'embedded' in a vector containing the N top features. While the camera moves, a database of images (called keyframes) is created (images are stored as much-lower dimensional vectors). Again while the camera moves, all images are used to query the database, something like cosine-similarity is used to retrieve the best match from the vector database. If a match happened, a stereo-constraints can be computed betweeen the query image and the match, and the software is able to update the map.
[1] is the original paper and here's the most famous implementation: https://github.com/dorian3d/DBoW2
[1]: https://www.google.com/search?client=firefox-b-d&q=Bags+of+B...
-
[D] Fastest SIFT Descriptors Matching with Database of SIFT Descriptors
This library is the most widely used bag of words implementation. Which is the standard for feature retrieval. There might be more advanced methods but you gotta do it yourself. It can also be used with non-sift descriptors. https://github.com/dorian3d/DBoW2
What are some alternatives?
langroid - Harness LLMs with Multi-Agent Programming
faiss - A library for efficient similarity search and clustering of dense vectors.
onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
supabase - The open source Firebase alternative.
txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
telekinesis - Control Objects and Functions Remotely
roadmap - This is the public roadmap for Salesforce Heroku services.
marqo - Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
llm-cluster - LLM plugin for clustering embeddings