[P] Looking for state of the art clustering algorithms

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
  1. pdc-dp-means

    "Revisiting DP-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation" [Dinari and Freifeld, UAI 2022]

    If you like DP-Means, see our recent paper, where we optimized it to be much much faster: https://openreview.net/pdf?id=rnzVBD8jqlq https://github.com/BGU-CS-VIL/pdc-dp-means The code is included as both IID, but also if you want to use the fastest version, as a module which can be built for sk-learn, the latter is as fast as k-means, with all the benefits of DP-Means :)

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. dpmmpython

    Python wrapper for the DPMMSubCluster Julia package for inference in Dirichlet Process Mixture Models (High Performance Machine Learning Workshop 2019)

    Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and

  4. VersatileHDPMixtureModels.jl

    Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"

    Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and

  5. dpmmpythonStreaming

    Python wrapper for the DPMMSubClusterStreaming.jl Julia package.

    Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data

  6. DeepDPM

    "DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]

  7. cuml

    cuML - RAPIDS Machine Learning Library

    As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).

  8. cudf

    cuDF - GPU DataFrame Library

    As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Unleashing GPU Power: Supercharge Your Data Processing with cuDF

    1 project | dev.to | 21 Jun 2024
  • CuDF – GPU DataFrame Library

    1 project | news.ycombinator.com | 1 Jun 2024
  • CuGraph – GPU-accelerated graph analytics

    1 project | news.ycombinator.com | 16 Oct 2023
  • Why we dropped Docker for Python environments

    1 project | /r/dataengineering | 12 Apr 2023
  • GPU implementation of shortest path?

    1 project | /r/learnpython | 8 Apr 2023

Did you know that Python is
the 2nd most popular programming language
based on number of references?