Our great sponsors
-
pdc-dp-means
"Revisiting DP-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation" [Dinari and Freifeld, UAI 2022]
-
dpmmpython
Python wrapper for the DPMMSubCluster Julia package for inference in Dirichlet Process Mixture Models (High Performance Machine Learning Workshop 2019)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
VersatileHDPMixtureModels.jl
Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"
-
DeepDPM
"DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
If you like DP-Means, see our recent paper, where we optimized it to be much much faster: https://openreview.net/pdf?id=rnzVBD8jqlq https://github.com/BGU-CS-VIL/pdc-dp-means The code is included as both IID, but also if you want to use the fastest version, as a module which can be built for sk-learn, the latter is as fast as k-means, with all the benefits of DP-Means :)
Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and
Distributed MCMC inference in Dirichlet process mixture models using Julia.Scalable and
Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data
As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).
As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).
Related posts
- CuGraph – GPU-accelerated graph analytics
- Why we dropped Docker for Python environments
- GPU implementation of shortest path?
- Is it possible to run Sklearn models on a GPU?
- [D] Can we use Ray for distributed training on vertex ai ? Can someone provide me examples for the same ? Also which dataframe libraries you guys used for training machine learning models on huge datasets (100 gb+) (because pandas can't handle huge data).