vectordb
llm-cluster
vectordb | llm-cluster | |
---|---|---|
6 | 3 | |
552 | 59 | |
5.1% | - | |
7.6 | 4.9 | |
1 day ago | 3 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
vectordb
-
VectorDB: Vector Database Built by Kagi Search
We needed a low latency, on premise solution that we can run on edge nodes (so lightweight) with sane defaults that anyone in the team can whim in a sec.
Result is this and we constantly benchmark performance of different embeddings to ensure best defaults.
[1] https://github.com/kagisearch/vectordb#embeddings-performanc...
-
Embeddings: What they are and why they matter
If you are looking for lightweight, low- latency, fully local, end-to-end solution (chunking, embedding, storage and vector search), try vectordb [1]
Just spent a day updating it with latest benchmarks for text embedding models.
[1] https://github.com/kagisearch/vectordb
llm-cluster
-
Embeddings: What they are and why they matter
I'm trying to understand the clustering code but not doing too well.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
So does this take each row from the DB, convert to a numpy array (?), then uses an existing model called MiniBatchKMeans (?) to go over that array and generate a bunch of labels. Then add it to a dictionary and print to console.
-
LLM now provides tools for working with embeddings
I imagine there are all kinds of improvements that could be made to this kind of thing.
I'd love to understand if there's a good way to automatically pick an interesting number of clusters, as opposed to picking a number at the start.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
What are some alternatives?
langroid - Harness LLMs with Multi-Agent Programming
telekinesis - Control Objects and Functions Remotely
onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
roadmap - This is the public roadmap for Salesforce Heroku services.
txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
DBoW2 - Enhanced hierarchical bag-of-word library for C++
datasette-faiss - Maintain a FAISS index for specified Datasette tables
marqo - Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
DP_means - Dirichlet Process K-means
supabase - The open source Firebase alternative.
bert - TensorFlow code and pre-trained models for BERT