[P] Looking for state of the art clustering algorithms

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • pdc-dp-means

    "Revisiting DP-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation" [Dinari and Freifeld, UAI 2022]

  • If you like DP-Means, see our recent paper, where we optimized it to be much much faster: https://openreview.net/pdf?id=rnzVBD8jqlq https://github.com/BGU-CS-VIL/pdc-dp-means The code is included as both IID, but also if you want to use the fastest version, as a module which can be built for sk-learn, the latter is as fast as k-means, with all the benefits of DP-Means :)

  • dpmmpython

    Python wrapper for the DPMMSubCluster Julia package for inference in Dirichlet Process Mixture Models (High Performance Machine Learning Workshop 2019)

  • Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • VersatileHDPMixtureModels.jl

    Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"

  • Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and

  • dpmmpythonStreaming

    Python wrapper for the DPMMSubClusterStreaming.jl Julia package.

  • Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data

  • DeepDPM

    "DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]

  • cuml

    cuML - RAPIDS Machine Learning Library

  • As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).

  • cudf

    cuDF - GPU DataFrame Library

  • As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts