yggdrasil-decision-forests
yggdrasil-decision-forests | higgs-logistic-regression | |
---|---|---|
4 | 2 | |
428 | 1 | |
3.0% | - | |
9.5 | 3.6 | |
5 days ago | over 3 years ago | |
C++ | Haskell | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
yggdrasil-decision-forests
-
Why do tree-based models still outperform deep learning on tabular data? (2022)
Is it this library https://github.com/google/yggdrasil-decision-forests ?
-
Binary image classification using random forest algorithm
However if you know cpp you can use Yggdrasil https://github.com/google/yggdrasil-decision-forests.
- Why do tree-based models still outperform deep learning on tabular data?
-
[P] Tree compiler that speeds up LightGBM model inference by ~30x
Have you tried to compare with Yggdrasil, the decision forest engine (c++, both training and inference) powering TensorFlow Decision Forests ?
higgs-logistic-regression
-
Why do tree-based models still outperform deep learning on tabular data?
Oh, you touched my favorite topic of whole dataset training.
Take a look at [1] and go straight to the page 8, figure 2(b).
[1] http://proceedings.mlr.press/v48/taylor16.pdf
The paper talks about whole dataset training and one of the datasets used is HIGGS [2]. The figure 2(b) shows two whole dataset training approaches (L-BFGS and ADMM) vs SGD. SGD tops at the accuracy with which both whole dataset approaches start, basically.
[2] https://archive.ics.uci.edu/ml/datasets/HIGGS#
HIGGS is strange dataset. It is narrow, having only 29 features. It is also relatively long, about 11M samples (10M to train, 0.5M to validate and last 0.5M to test). It is also hard to get right with SGD.
But if you perform whole dataset optimization, even linear regression can get you good accuracy [3] (some experiments of mine).
[3] https://github.com/thesz/higgs-logistic-regression
-
Google Open-Sources Trillion-Parameter AI Language Model Switch Transformer
I beg to disagree.
[1] provides one with a whole-data-set training method (ADMM, one of such methods). Page 8 contains figure 2(b) - accuracy of training after specified amount of time. Note that ADMM start where stochastic gradient stops.
[1] https://arxiv.org/pdf/1605.02026.pdf
At [2] I tried to apply logistic regression trained using reweighted least squares algorithm on the same Higgs boson data set. I've got the same accuracy (64%) as mentioned in the ADMM paper with much less number of coefficients - basically, just the size of input vector + 1 instead of 300 such rows of coefficients and then 300x1 affine transformation. When I added squares of inputs (for the simplest approximation of polynomial regression) and used the same reweighted iterative least squares algorithm, I've got even better accuracy (66%) for double the number of coefficients.
[2] https://github.com/thesz/higgs-logistic-regression
There's a hypothesis [3] that SGD and ADAM are best optimizers because that everyone use and report on. Rarely if ever you get anything that differ.
[3] https://parameterfree.com/2020/12/06/neural-network-maybe-ev...
So, answering your question of "how do you know" - researchers at Google cannot do IRLS (search provides IRLS only for logistic regression in Tensorflow), they cannot do Hessian-free optimization ([4], closed due lack of activity - notice the "we can't support RNN due to the WHILE loop" bonanza), etc. All due to the fact they have to use Tensorflow - it just does not support these things.
https://github.com/tensorflow/tensorflow/issues/2682
I haven't seen anything about whole-data-set optimization from Google at all. That's why I (and only me - due to standing I take and experiments I did) conclude that they do not quite care about parameter efficiency.
What are some alternatives?
LightGBM - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Spearmint - Spearmint Bayesian optimization codebase
tensorflow - An Open Source Machine Learning Framework for Everyone
decision-forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
decision-tree-classifier - Decision Tree Classifier and Boosted Random Forest
flashlight - A C++ standalone library for machine learning [Moved to: https://github.com/flashlight/flashlight]
interpret - Fit interpretable models. Explain blackbox machine learning.
srbench - A living benchmark framework for symbolic regression
MLBenchmarks.jl - ML models benchmarks on public dataset