PyImpetus
hdbscan
Our great sponsors
PyImpetus | hdbscan | |
---|---|---|
1 | 6 | |
111 | 2,534 | |
- | 1.7% | |
0.0 | 4.4 | |
6 months ago | about 1 month ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
PyImpetus
We haven't tracked posts mentioning PyImpetus yet.
Tracking mentions began in Dec 2020.
hdbscan
-
Introducing the Semantic Graph
A number of excellent topic modeling libraries exist in Python today. BERTopic and Top2Vec are two of the most popular. Both use sentence-transformers to encode data into vectors, UMAP for dimensionality reduction and HDBSCAN to cluster nodes.
-
Introduction to K-Means Clustering
Working in spatial data science, I rarely find applications where k-means is the best tool. The problem is that it is difficult to know how many clusters you can expect on maps. Is it 5, 500, or 10,000? Here HDBSCAN [1] shines because it will cluster _and_ select the most suitable number of clusters, to cut the single linkage cluster tree.
-
[D] Good algorithm for clustering big data (sentences represented as embeddings)?
Maybe use (H)DBScan which I think should work also for huge datasets. I don't think there is a ready to use clustering with unbuild cosine similarily metrics, and you also won't be able to precompute the 100k X 100k dense similarity matrix. The only way to go on this is to L2 normalize your embeddings, then the dot product will be the angular distance as a proxy to the cosine similarily. See also https://github.com/scikit-learn-contrib/hdbscan/issues/69
What are some alternatives?
faiss - A library for efficient similarity search and clustering of dense vectors.
Top2Vec - Top2Vec learns jointly embedded topic, document and word vectors.
Milvus - A cloud-native vector database, storage for next generation AI applications
homemade-machine-learning - 🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
machine_learning_basics - Plain python implementations of basic machine learning algorithms
100DaysofMLCode - My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge. Now supported by bright developers adding their learnings :+1:
word2vec - Automatically exported from code.google.com/p/word2vec
leidenalg - Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
feature-engineering-tutorials - Data Science Feature Engineering and Selection Tutorials