cuml vs hummingbird

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

cuml		hummingbird
	Project
10	Mentions	9
3,894	Stars	3,301
2.0%	Growth	0.7%
9.3	Activity	7.3
1 day ago	Latest Commit	25 days ago
C++	Language	Python
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

cuml

Posts with mentions or reviews of cuml. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-13.

FLaNK Stack Weekly for 13 November 2023
30 projects | dev.to | 13 Nov 2023
Is it possible to run Sklearn models on a GPU?
1 project | /r/datascience | 5 Mar 2023

sklearn can't, bit take a look at cuML (https://github.com/rapidsai/cuml ). It uses the same API as sklearn but executes on GPU.
[P] Looking for state of the art clustering algorithms
8 projects | /r/MachineLearning | 14 Sep 2022

As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).
Is there a multi regression model that works on GPU?
2 projects | /r/learnmachinelearning | 4 Aug 2022

CuML
[D] What's your favorite unpopular/forgotten Machine Learning method?
2 projects | /r/MachineLearning | 2 Mar 2022
Machine Learning with PyTorch and Scikit-Learn – The *New* Python ML Book
3 projects | news.ycombinator.com | 25 Feb 2022
What are the advantages and disadvantages of using GPU for machine learning/ deep learning/ scientific computation over the conventional CPU software acceleration?
1 project | /r/bioinformatics | 15 Nov 2021

Did they implement the clustering algorithm themselves? cuML is a GPU-accelerated scikit-learn-like package that covers many of the common ML algorithms.
Intel Extension for Scikit-Learn
4 projects | news.ycombinator.com | 1 Nov 2021

https://github.com/rapidsai/cuml
> cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.
GPU Based Kernel-PCA
2 projects | /r/MLQuestions | 22 Jan 2021

Cython code
Python Machine Learning Guy getting started with CUDA. What should I be brushing up on?
2 projects | /r/CUDA | 10 Jan 2021

Take a look at RAPIDS CUML https://github.com/rapidsai/cuml. It's useful for most common ML algorithms. Feel free to create Github issues for feature requests & bugs.

hummingbird

Posts with mentions or reviews of hummingbird. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-11.

Treebomination: Convert a scikit-learn decision tree into a Keras model
3 projects | news.ycombinator.com | 11 Jun 2023
[D] GPU-enabled scikit-learn
3 projects | /r/MachineLearning | 30 Dec 2022

If are interested in just predictions you can try Hummingbird. It is part of the PyTorch ecosystem. We get already trained scikit-learn models and translate them into PyTorch models. From them you can run your model on any hardware support by PyTorch, export it into TVM, ONNX, etc. Performance on hardware acceleration is quite good (orders of magnitude better than scikit-learn is some cases)
Machine Learning with PyTorch and Scikit-Learn – The *New* Python ML Book
3 projects | news.ycombinator.com | 25 Feb 2022

I think Rapids AI's cuML tried to go into this direction (essentially scikit-learn on the GPU): https://docs.rapids.ai/api/cuml/stable/api.html#logistic-reg.... For some reason it never took really off though.
Btw., going on a tangent, you might like Hummingbird (https://github.com/microsoft/hummingbird). It allows you trained scikit-learn tree-based models to PyTorch. I watched the SciPy talk last year, and it's a super smart & elegant idea.
Export and run models with ONNX
3 projects | dev.to | 7 Sep 2021

ONNX opens an avenue for direct inference using a number of languages and platforms. For example, a model could be run directly on Android to limit data sent to a third party service. ONNX is an exciting development with a lot of promise. Microsoft has also released Hummingbird which enables exporting traditional models (sklearn, decision trees, logistical regression..) to ONNX.
Supreme Court, in a 6–2 ruling in Google v. Oracle, concludes that Google’s use of Java API was a fair use of that material
16 projects | /r/Android | 5 Apr 2021

And Python.
[D] Here are 3 ways to Speed Up Scikit-Learn - Any suggestions?
2 projects | /r/MachineLearning | 4 Feb 2021

For inference, you can convert your models to other formats that support GPU acceleration. See Hummingbird https://github.com/microsoft/hummingbird
[D] Microsoft library, Hummingbird, compiles trained ML models into tensor computation for faster inference.
1 project | /r/MachineLearning | 21 Dec 2020

The surprising thing is that Hummingbird can be faster than the GPU implementation of LightGBM (and XGBoost) if you use tensor compilers such as TVM. [The paper](https://www.usenix.org/conference/osdi20/presentation/nakandala) describes our findings. We have also open sourced the [benchmark code](https://github.com/microsoft/hummingbird/tree/main/benchmarks) so you try yourself!
I learned about Microsoft's Hummingbird library today. 1000x performance??
1 project | dev.to | 23 Sep 2020

I took their sample code from Github and tweaked it to spit out times for each model's prediction, as well as increase the number of rows to 5 million. I used Google's Colab and selected GPU for my hardware accelerator. This gives an option to run code on GPU, not that all computations will happen on the GPU.

What are some alternatives?

When comparing cuml and hummingbird you can also consider the following projects:

scikit-learn - scikit-learn: machine learning in Python

onnx - Open standard for machine learning interoperability

scikit-learn-intelex - Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

swift - The Swift Programming Language

scikit-cuda - Python interface to GPU-powered libraries

sentence-transformers - Multilingual Sentence & Image Embeddings with BERT

cudf - cuDF - GPU DataFrame Library

docker - Docker - the open-source application container engine

evojax

chemprop - Message Passing Neural Networks for Molecule Property Prediction

lightseq - LightSeq: A High Performance Library for Sequence Processing and Generation

tune-sklearn - A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.