llm-cluster
DP_means
llm-cluster | DP_means | |
---|---|---|
3 | 1 | |
60 | 45 | |
- | - | |
4.9 | 1.7 | |
3 months ago | about 1 year ago | |
Python | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llm-cluster
-
Embeddings: What they are and why they matter
I'm trying to understand the clustering code but not doing too well.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
So does this take each row from the DB, convert to a numpy array (?), then uses an existing model called MiniBatchKMeans (?) to go over that array and generate a bunch of labels. Then add it to a dictionary and print to console.
-
LLM now provides tools for working with embeddings
I imagine there are all kinds of improvements that could be made to this kind of thing.
I'd love to understand if there's a good way to automatically pick an interesting number of clusters, as opposed to picking a number at the start.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
DP_means
-
LLM now provides tools for working with embeddings
I found one implementation here: https://github.com/vsmolyakov/DP_means
Alternatively, there is a Bayesian GMM in sklearn. When you restrict it to diagonal Covariance matrices, you should be fine in high dimensions
What are some alternatives?
telekinesis - Control Objects and Functions Remotely
llm-llama-cpp - LLM plugin for running models using llama.cpp
roadmap - This is the public roadmap for Salesforce Heroku services.
datasette-faiss - Maintain a FAISS index for specified Datasette tables
DBoW2 - Enhanced hierarchical bag-of-word library for C++
llm-gpt4all - Plugin for LLM adding support for the GPT4All collection of models
bert - TensorFlow code and pre-trained models for BERT
vectordb - A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.
marqo - Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
supabase - The open source Firebase alternative.