-
pdc-dp-means
"Revisiting DP-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation" [Dinari and Freifeld, UAI 2022]
If you like DP-Means, see our recent paper, where we optimized it to be much much faster: https://openreview.net/pdf?id=rnzVBD8jqlq https://github.com/BGU-CS-VIL/pdc-dp-means The code is included as both IID, but also if you want to use the fastest version, as a module which can be built for sk-learn, the latter is as fast as k-means, with all the benefits of DP-Means :)
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
dpmmpython
Python wrapper for the DPMMSubCluster Julia package for inference in Dirichlet Process Mixture Models (High Performance Machine Learning Workshop 2019)
Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and
-
VersatileHDPMixtureModels.jl
Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"
Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and
-
Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data
-
DeepDPM
"DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]
-
As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).
-
As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.