[P] Looking for state of the art clustering algorithms

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pdc-dp-means

2 15 8.1 Python

"Revisiting DP-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation" [Dinari and Freifeld, UAI 2022]

If you like DP-Means, see our recent paper, where we optimized it to be much much faster: https://openreview.net/pdf?id=rnzVBD8jqlq https://github.com/BGU-CS-VIL/pdc-dp-means The code is included as both IID, but also if you want to use the fastest version, as a module which can be built for sk-learn, the latter is as fast as k-means, with all the benefits of DP-Means :)

dpmmpython

2 17 0.0 Python

Python wrapper for the DPMMSubCluster Julia package for inference in Dirichlet Process Mixture Models (High Performance Machine Learning Workshop 2019)

Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
VersatileHDPMixtureModels.jl

2 8 0.0 Julia

Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"

Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and

dpmmpythonStreaming

3 14 0.0 Python

Python wrapper for the DPMMSubClusterStreaming.jl Julia package.

Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data

DeepDPM

5 735 1.2 Python

"DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]
cuml

10 3,894 9.3 C++

cuML - RAPIDS Machine Learning Library

As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).

cudf

23 7,274 9.9 C++

cuDF - GPU DataFrame Library

As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project