hdbscan

A high performance implementation of HDBSCAN clustering. (by scikit-learn-contrib)

Hdbscan Alternatives

Similar projects and alternatives to hdbscan

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better hdbscan alternative or higher similarity.

hdbscan reviews and mentions

Posts with mentions or reviews of hdbscan. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-09-16.
  • Introducing the Semantic Graph
    5 projects | dev.to | 16 Sep 2022
    A number of excellent topic modeling libraries exist in Python today. BERTopic and Top2Vec are two of the most popular. Both use sentence-transformers to encode data into vectors, UMAP for dimensionality reduction and HDBSCAN to cluster nodes.
  • Introduction to K-Means Clustering
    5 projects | news.ycombinator.com | 14 Mar 2022
    Working in spatial data science, I rarely find applications where k-means is the best tool. The problem is that it is difficult to know how many clusters you can expect on maps. Is it 5, 500, or 10,000? Here HDBSCAN [1] shines because it will cluster _and_ select the most suitable number of clusters, to cut the single linkage cluster tree.

    [1]: https://github.com/scikit-learn-contrib/hdbscan

  • [D] Good algorithm for clustering big data (sentences represented as embeddings)?
    5 projects | /r/MachineLearning | 31 Mar 2021
    Maybe use (H)DBScan which I think should work also for huge datasets. I don't think there is a ready to use clustering with unbuild cosine similarily metrics, and you also won't be able to precompute the 100k X 100k dense similarity matrix. The only way to go on this is to L2 normalize your embeddings, then the dot product will be the angular distance as a proxy to the cosine similarily. See also https://github.com/scikit-learn-contrib/hdbscan/issues/69
  • A note from our sponsor - Mergify
    blog.mergify.com | 22 Sep 2023
    Managing outdated pull requests is time-consuming. Mergify's Merge Queue automates your pull request management & merging. It's fully integrated to GitHub & coordinated with any CI. Start focusing on code. Try Mergify for free. Learn more →

Stats

Basic hdbscan repo stats
6
2,532
4.5
about 1 month ago

scikit-learn-contrib/hdbscan is an open source project licensed under BSD 3-clause "New" or "Revised" License which is an OSI approved license.

The primary programming language of hdbscan is Jupyter Notebook.

Tired of breaking your main and manually rebasing outdated pull requests?
Managing outdated pull requests is time-consuming. Mergify's Merge Queue automates your pull request management & merging. It's fully integrated to GitHub & coordinated with any CI. Start focusing on code. Try Mergify for free.
blog.mergify.com