homemade-machine-learning VS hdbscan

Compare homemade-machine-learning vs hdbscan and see what are their differences.


🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained (by trekhleb)
Our great sponsors
  • SonarQube - Static code analysis for 29 languages.
  • CodiumAI - TestGPT | Generating meaningful tests for busy devs
  • InfluxDB - Access the most powerful time series database as a service
  • ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
homemade-machine-learning hdbscan
7 6
21,324 2,445
- 2.1%
5.2 6.9
25 days ago 2 months ago
Jupyter Notebook Jupyter Notebook
MIT License BSD 3-clause "New" or "Revised" License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.


Posts with mentions or reviews of homemade-machine-learning. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-10-14.


Posts with mentions or reviews of hdbscan. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-09-16.
  • Introducing the Semantic Graph
    5 projects | dev.to | 16 Sep 2022
    A number of excellent topic modeling libraries exist in Python today. BERTopic and Top2Vec are two of the most popular. Both use sentence-transformers to encode data into vectors, UMAP for dimensionality reduction and HDBSCAN to cluster nodes.
  • Introduction to K-Means Clustering
    5 projects | news.ycombinator.com | 14 Mar 2022
    Working in spatial data science, I rarely find applications where k-means is the best tool. The problem is that it is difficult to know how many clusters you can expect on maps. Is it 5, 500, or 10,000? Here HDBSCAN [1] shines because it will cluster _and_ select the most suitable number of clusters, to cut the single linkage cluster tree.

    [1]: https://github.com/scikit-learn-contrib/hdbscan

  • [D] Good algorithm for clustering big data (sentences represented as embeddings)?
    5 projects | reddit.com/r/MachineLearning | 31 Mar 2021
    Maybe use (H)DBScan which I think should work also for huge datasets. I don't think there is a ready to use clustering with unbuild cosine similarily metrics, and you also won't be able to precompute the 100k X 100k dense similarity matrix. The only way to go on this is to L2 normalize your embeddings, then the dot product will be the angular distance as a proxy to the cosine similarily. See also https://github.com/scikit-learn-contrib/hdbscan/issues/69

What are some alternatives?

When comparing homemade-machine-learning and hdbscan you can also consider the following projects:

faiss - A library for efficient similarity search and clustering of dense vectors.

Top2Vec - Top2Vec learns jointly embedded topic, document and word vectors.

lego-mindstorms - My LEGO MINDSTORMS projects (using set 51515 electronics)

Milvus - A cloud-native vector database, storage for next generation AI applications

wordle-solver - For educational purposes, a simple script that assists in solving the word game Wordle.

rmi - A learned index structure

PyImpetus - PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

PythonRobotics - Python sample codes for robotics algorithms.

raku-jupyter-kernel - Raku Kernel for Jupyter/IPython notebooks

CFDPython - A sequence of Jupyter notebooks featuring the "12 Steps to Navier-Stokes" http://lorenabarba.com/

100DaysofMLCode - My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge. Now supported by bright developers adding their learnings :+1:

AlgorithmicTrading - This repository contains three ways to obtain arbitrage which are Dual Listing, Options and Statistical Arbitrage. These are projects in collaboration with Optiver and have been peer-reviewed by staff members of Optiver.