hummingbird
keops
hummingbird | keops | |
---|---|---|
9 | 5 | |
3,304 | 1,000 | |
0.5% | 1.5% | |
7.1 | 9.5 | |
17 days ago | 9 days ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hummingbird
- Treebomination: Convert a scikit-learn decision tree into a Keras model
-
[D] GPU-enabled scikit-learn
If are interested in just predictions you can try Hummingbird. It is part of the PyTorch ecosystem. We get already trained scikit-learn models and translate them into PyTorch models. From them you can run your model on any hardware support by PyTorch, export it into TVM, ONNX, etc. Performance on hardware acceleration is quite good (orders of magnitude better than scikit-learn is some cases)
-
Machine Learning with PyTorch and Scikit-Learn – The *New* Python ML Book
I think Rapids AI's cuML tried to go into this direction (essentially scikit-learn on the GPU): https://docs.rapids.ai/api/cuml/stable/api.html#logistic-reg.... For some reason it never took really off though.
Btw., going on a tangent, you might like Hummingbird (https://github.com/microsoft/hummingbird). It allows you trained scikit-learn tree-based models to PyTorch. I watched the SciPy talk last year, and it's a super smart & elegant idea.
-
Export and run models with ONNX
ONNX opens an avenue for direct inference using a number of languages and platforms. For example, a model could be run directly on Android to limit data sent to a third party service. ONNX is an exciting development with a lot of promise. Microsoft has also released Hummingbird which enables exporting traditional models (sklearn, decision trees, logistical regression..) to ONNX.
-
Supreme Court, in a 6–2 ruling in Google v. Oracle, concludes that Google’s use of Java API was a fair use of that material
And Python.
-
[D] Here are 3 ways to Speed Up Scikit-Learn - Any suggestions?
For inference, you can convert your models to other formats that support GPU acceleration. See Hummingbird https://github.com/microsoft/hummingbird
-
[D] Microsoft library, Hummingbird, compiles trained ML models into tensor computation for faster inference.
The surprising thing is that Hummingbird can be faster than the GPU implementation of LightGBM (and XGBoost) if you use tensor compilers such as TVM. [The paper](https://www.usenix.org/conference/osdi20/presentation/nakandala) describes our findings. We have also open sourced the [benchmark code](https://github.com/microsoft/hummingbird/tree/main/benchmarks) so you try yourself!
-
I learned about Microsoft's Hummingbird library today. 1000x performance??
I took their sample code from Github and tweaked it to spit out times for each model's prediction, as well as increase the number of rows to 5 million. I used Google's Colab and selected GPU for my hardware accelerator. This gives an option to run code on GPU, not that all computations will happen on the GPU.
keops
-
[D] GPU-enabled scikit-learn
From direct discussions with the sklearn team, note that this may change relatively soon: a GPU engineer funded by Intel was recently added to the core development team. Last time I met with the team in person (6 months ago), the project was to factor some of the most GPU friendly computations out of the sklearn code base, such as K-Nearest Neighbor search or kernel-related computations, and to document an internal API to let external developers easily develop accelerated backends. As shown by e.g. our KeOps library, GPUs are extremely well suited to classical ML and sklearn is the perfect platform to let users fully take advantage of their hardware. Let’s hope that OP’s question will become redundant at some point in 2023-24 :-)
-
[Research] Optimizing a kernel matrix
There has been major progress on the representation of kernel matrices over the last five years. Notably, the KeOps library is an extension for PyTorch/NumPy/etc. that allows you to perform the operations you're thinking of very quickly (10-100 faster than a standard GPU implementation with PyTorch), with low memory usage.
-
Scalable GPs [D]
For references on easy-to-use software, you may be interested by e.g. the Falkon and KeOps libraries that were presented as oral/spotlight at last year’s NeurIPS, GPyTorch that you may already know, etc.
- KeOps: Kernel Operations on the GPU
-
[D] why did kernel methods become less popular than neural networks?
You're very welcome! As of today it is still mostly useful when you have less than 50-100 features per point (as detailed here or there), but it's very versatile. We are actively working on making it as useful as possible for the community: if you encounter any issue with it, feel free to let us know!
What are some alternatives?
onnx - Open standard for machine learning interoperability
falkon - Large-scale, multi-GPU capable, kernel solver
swift - The Swift Programming Language
sentence-transformers - Multilingual Sentence & Image Embeddings with BERT
cuml - cuML - RAPIDS Machine Learning Library
docker - Docker - the open-source application container engine
chemprop - Message Passing Neural Networks for Molecule Property Prediction
tune-sklearn - A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
ServiceTalk - A networking framework that evolves with your application
password-manager-resources - A place for creators and users of password managers to collaborate on resources to make password management better.
coremltools - Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
swift-evolution - This maintains proposals for changes and user-visible enhancements to the Swift Programming Language.