lleaves
catboost
Our great sponsors
lleaves | catboost | |
---|---|---|
4 | 8 | |
292 | 7,744 | |
- | 1.6% | |
7.0 | 9.9 | |
27 days ago | 4 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lleaves
- LLeaves: A LLVM-based compiler for LightGBM decision trees
-
Cold Showers
I built this decision tree (LightGBM) compiler last summer: https://github.com/siboehm/lleaves
It get's you ~10x speedups for batch predictions, more if your model is big. It's not complicated, it ended up being <1K lines of Python code. I heard a couple of stories like yours, where people had multi-node spark clusters running LightGBM, and it always amused me because by if you compiled the trees instead you could get rid of the whole cluster.
-
Tree compiler that speeds up LightGBM model inference by ~30x
In a near-future version I'll expose some of the compilation parameters, I was somewhat afraid of having an API that's too complicated deterring people who just want a no-fuzz drop-in replacement for LightGBM. But as long as I keep sane defaults and have the parameters optional it should be fine. Relevant parameters are definitely block size (needs to adjust to L1i size and tree size) as well as the LLVM codemodel (a smaller adress space increases single-batch prediction speeds but doesn't work for large models). The thread-size specific compilation I'm still looking into, it makes the API more complicated and so might not be worth it.
catboost
- CatBoost: Open-source gradient boosting library
- Boosting Algorithms
-
What's New with AWS: Amazon SageMaker built-in algorithms now provides four new Tabular Data Modeling Algorithms
CatBoost is another popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT). To learn how to use this algorithm, please see example notebooks for Classification and Regression.
-
Writing the fastest GBDT libary in Rust
Here are our benchmarks on training time comparing Tangram's Gradient Boosted Decision Tree Library to LightGBM, XGBoost, CatBoost, and sklearn.
-
Data Science toolset summary from 2021
Catboost - CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. Link - https://catboost.ai/
-
CatBoost Quickstart — ML Classification
CatBoost is an open source algorithm based on gradient boosted decision trees. It supports numerical, categorical and text features. Check out the docs.
-
[D] What are your favorite Random Forest implementations that support categoricals
If you considering GBDT check out catboost, unfortunately RF mode is not available but library implement lots of interesting categorical encoding tricks that boost accuracy.
-
CatBoost and Water Pumps
The data contains a large number of categorical features. The most suitable for obtaining a base-line model, in my opinion, is CatBoost. It is a high-performance, open-source library for gradient boosting on decision trees.
What are some alternatives?
mljar-supervised - Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
ngboost - Natural Gradient Boosting for Probabilistic Prediction
Recommender - A C library for product recommendations/suggestions using collaborative filtering (CF)
m2cgen - Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
Keras - Deep Learning for humans
miceforest - Multiple Imputation with LightGBM in Python
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
vowpal_wabbit - Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
mxnet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
CCV - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
Porcupine - On-device wake word detection powered by deep learning