DALEX
catboost
DALEX | catboost | |
---|---|---|
2 | 8 | |
1,323 | 7,753 | |
0.6% | 0.7% | |
5.5 | 9.9 | |
2 months ago | about 5 hours ago | |
Python | Python | |
GNU General Public License v3.0 only | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DALEX
-
Twitter set to accept ‘best and final offer’ of Elon Musk
Which he will not do, because: a) He can't, it's a black box algorithm. It actually is open source already, but that doesn't mean much as it's useless without Twitter's data https://github.com/ModelOriented/DALEX b) He won't release data that shows the algorithm is racist and amplifies conservative and extremist content. He won't remove such functions because it will cost him billions.
-
[D] What are your favorite Random Forest implementations that support categoricals
There are a couple of ways to use Shapley values for explanations in R. One way is to use DALEX, which also contains a lot of other methods besides SHAP. Another one is iml. I am sure there are several other implementations of SHAP as well.
catboost
- CatBoost: Open-source gradient boosting library
- Boosting Algorithms
-
What's New with AWS: Amazon SageMaker built-in algorithms now provides four new Tabular Data Modeling Algorithms
CatBoost is another popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT). To learn how to use this algorithm, please see example notebooks for Classification and Regression.
-
Writing the fastest GBDT libary in Rust
Here are our benchmarks on training time comparing Tangram's Gradient Boosted Decision Tree Library to LightGBM, XGBoost, CatBoost, and sklearn.
-
Data Science toolset summary from 2021
Catboost - CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. Link - https://catboost.ai/
-
CatBoost Quickstart — ML Classification
CatBoost is an open source algorithm based on gradient boosted decision trees. It supports numerical, categorical and text features. Check out the docs.
-
[D] What are your favorite Random Forest implementations that support categoricals
If you considering GBDT check out catboost, unfortunately RF mode is not available but library implement lots of interesting categorical encoding tricks that boost accuracy.
-
CatBoost and Water Pumps
The data contains a large number of categorical features. The most suitable for obtaining a base-line model, in my opinion, is CatBoost. It is a high-performance, open-source library for gradient boosting on decision trees.
What are some alternatives?
shapley - The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
captum - Model interpretability and understanding for PyTorch
Recommender - A C library for product recommendations/suggestions using collaborative filtering (CF)
Lime-For-Time - Application of the LIME algorithm by Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin to the domain of time series classification
Keras - Deep Learning for humans
responsible-ai-toolbox - Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
LIME - Tutorial notebooks on explainable Machine Learning with LIME (Original work: https://arxiv.org/abs/1602.04938)
vowpal_wabbit - Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
interpret - Fit interpretable models. Explain blackbox machine learning.
mxnet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more