Spearmint
decision-forests
Our great sponsors
Spearmint | decision-forests | |
---|---|---|
2 | 1 | |
1,529 | 650 | |
0.1% | 1.5% | |
0.0 | 8.3 | |
over 4 years ago | 5 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Spearmint
-
Why do tree-based models still outperform deep learning on tabular data?
It occurs to me that a system, trained on peer-reviewed applied-machine-learning literature and Kaggle winners, that generates candidates for structured feature-engineering specifications, based on plaintext descriptions of columns' real-world meaning, should be considered a requisite part of the "meta" here.
Ah, and then you could iterate within the resulting feature-engineering-suggestion space as a hyper-parameter between experiments, which could be optimized with e.g. https://github.com/HIPS/Spearmint . The papers write themselves!
-
[D] What kind of Hyperparameter Optimisation do you use?
This was some time ago but I had some promising results with Bayesian optimization using a Gaussian Process prior. The method was developed by the guys who wrote Spearmint. That library doesn't support parallelization but I implemented the same technique in Scala without too much difficulty.
decision-forests
-
Why do tree-based models still outperform deep learning on tabular data?
I can't explain it, but I help maintain TensorFlow Decision Forests [1] and Yggdrasil Decision Forests [2], and in an AutoML system at work that trains models on lots of various users data, decision forest models gets selected as best (after AutoML tries various model types and hyperparameters) somewhere between 20% to 40% of the times, systematically. It's pretty interesting. Other ML types considered are NN, linear models (with auto feature crossings generation), and a couple of other variations.
[1] https://github.com/tensorflow/decision-forests
What are some alternatives?
optuna - A hyperparameter optimization framework
yggdrasil-decision-forests - A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
srbench - A living benchmark framework for symbolic regression
higgs-logistic-regression
axe-testcafe - The helper for using Axe in TestCafe tests
youtube-react - A Youtube clone built in React, Redux, Redux-saga
spaceopt - Hyperparameter optimization via gradient boosting regression