llm-cluster
datasette-faiss
llm-cluster | datasette-faiss | |
---|---|---|
3 | 1 | |
60 | 32 | |
- | - | |
4.9 | 10.0 | |
3 months ago | over 1 year ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llm-cluster
-
Embeddings: What they are and why they matter
I'm trying to understand the clustering code but not doing too well.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
So does this take each row from the DB, convert to a numpy array (?), then uses an existing model called MiniBatchKMeans (?) to go over that array and generate a bunch of labels. Then add it to a dictionary and print to console.
-
LLM now provides tools for working with embeddings
I imagine there are all kinds of improvements that could be made to this kind of thing.
I'd love to understand if there's a good way to automatically pick an interesting number of clusters, as opposed to picking a number at the start.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
datasette-faiss
-
LLM now provides tools for working with embeddings
I experimented with that a few months ago. Building a fresh FAISS index for a few thousand matches is really quick, so o think it's often better to filter first, build a scratch index and then use that for similarity: https://github.com/simonw/datasette-faiss/issues/3
What are some alternatives?
telekinesis - Control Objects and Functions Remotely
llm-gpt4all - Plugin for LLM adding support for the GPT4All collection of models
roadmap - This is the public roadmap for Salesforce Heroku services.
DP_means - Dirichlet Process K-means
DBoW2 - Enhanced hierarchical bag-of-word library for C++
bert - TensorFlow code and pre-trained models for BERT
vectordb - A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.
marqo - Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
supabase - The open source Firebase alternative.