srbench vs higgs-logistic-regression

srbench

A living benchmark framework for symbolic regression (by cavalab)

Suggest topics

Source Code

cavalab.org

Suggest alternative

Edit details

higgs-logistic-regression

By thesz

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

srbench		higgs-logistic-regression
	Project
2	Mentions	2
194	Stars	1
3.1%	Growth	-
9.1	Activity	3.6
3 months ago	Latest Commit	over 3 years ago
Python	Language	Haskell
GNU General Public License v3.0 only	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

srbench

Posts with mentions or reviews of srbench. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-08-03.

Ask HN: Is genetic programming still actively researched?
1 project | news.ycombinator.com | 6 Aug 2023

NEAT and neuroevolution in general are interesting approaches. I also suggest to check techniques like DENSER [1] that can be used to evolve deep networks (by using the evolutionary part on the network structure and not on the weights).
Genetic Programming (GP), however, has not evolved to NEAT (which itself is not very recent, being published in 2002) but simply neuroevolution has become one of the topics that are part of evolutionary computation (EC). For example, one of the largest yearly conferences on evolutionary computation (GECCO) [2] was just last month with both neuroevolution and GP tracks. It is however true that the success of neural techniques had an effect on the community, some effects are the discussion of the role of EC and, for example, more space given to hybrid works (see, for example, the joint track on evolutionary machine learning [3] inside the evostar event).
Related to the original post, a place where some recent research on GP can be found are the proceedings of GECCO (GP track), EuroGP (part of evostar), PPSN (Parallel Problem Solving from Nature), and IEEE CEC (IEEE Congress on Evolutionary Computation) and journals like Genetic Programming and Evolvable Machine (GPEM), Swarm and Evolutionary Computation (SWEVO), and IEEE Transactions on Evolutionary Computation (IEEE TEVC). The list is not exhaustive, but those are some well-known venues.
For a less "daunting" starting point, some recent techniques are being added to the SRBench benchmark suite [4], with links to both the code and the paper describing the technique.
[1] Assunção, F., Lourenço, N., Machado, P., & Ribeiro, B. (2019, March). Fast denser: Efficient deep neuroevolution. In european conference on genetic programming (pp. 197-212). Cham: Springer International Publishing.
[2] https://gecco-2023.sigevo.org/HomePage
[3] https://www.evostar.org/2024/eml/
[4] https://github.com/cavalab/srbench
Why do tree-based models still outperform deep learning on tabular data?
5 projects | news.ycombinator.com | 3 Aug 2022

A great paper and an important result.
However, it omits to cite the highly relevant SRBench paper from 2021, which also carefully curates a suitable set of regression benchmarks and shows that Genetic Programming approaches also tend to be better than deep learning.
https://github.com/cavalab/srbench
cc u/optimalsolver

higgs-logistic-regression

Posts with mentions or reviews of higgs-logistic-regression. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-08-03.

Why do tree-based models still outperform deep learning on tabular data?
5 projects | news.ycombinator.com | 3 Aug 2022

Oh, you touched my favorite topic of whole dataset training.
Take a look at [1] and go straight to the page 8, figure 2(b).
[1] http://proceedings.mlr.press/v48/taylor16.pdf
The paper talks about whole dataset training and one of the datasets used is HIGGS [2]. The figure 2(b) shows two whole dataset training approaches (L-BFGS and ADMM) vs SGD. SGD tops at the accuracy with which both whole dataset approaches start, basically.
[2] https://archive.ics.uci.edu/ml/datasets/HIGGS#
HIGGS is strange dataset. It is narrow, having only 29 features. It is also relatively long, about 11M samples (10M to train, 0.5M to validate and last 0.5M to test). It is also hard to get right with SGD.
But if you perform whole dataset optimization, even linear regression can get you good accuracy [3] (some experiments of mine).
[3] https://github.com/thesz/higgs-logistic-regression
Google Open-Sources Trillion-Parameter AI Language Model Switch Transformer
1 project | news.ycombinator.com | 17 Feb 2021

I beg to disagree.
[1] provides one with a whole-data-set training method (ADMM, one of such methods). Page 8 contains figure 2(b) - accuracy of training after specified amount of time. Note that ADMM start where stochastic gradient stops.
[1] https://arxiv.org/pdf/1605.02026.pdf
At [2] I tried to apply logistic regression trained using reweighted least squares algorithm on the same Higgs boson data set. I've got the same accuracy (64%) as mentioned in the ADMM paper with much less number of coefficients - basically, just the size of input vector + 1 instead of 300 such rows of coefficients and then 300x1 affine transformation. When I added squares of inputs (for the simplest approximation of polynomial regression) and used the same reweighted iterative least squares algorithm, I've got even better accuracy (66%) for double the number of coefficients.
[2] https://github.com/thesz/higgs-logistic-regression
There's a hypothesis [3] that SGD and ADAM are best optimizers because that everyone use and report on. Rarely if ever you get anything that differ.
[3] https://parameterfree.com/2020/12/06/neural-network-maybe-ev...
So, answering your question of "how do you know" - researchers at Google cannot do IRLS (search provides IRLS only for logistic regression in Tensorflow), they cannot do Hessian-free optimization ([4], closed due lack of activity - notice the "we can't support RNN due to the WHILE loop" bonanza), etc. All due to the fact they have to use Tensorflow - it just does not support these things.
https://github.com/tensorflow/tensorflow/issues/2682
I haven't seen anything about whole-data-set optimization from Google at all. That's why I (and only me - due to standing I take and experiments I did) conclude that they do not quite care about parameter efficiency.

What are some alternatives?

When comparing srbench and higgs-logistic-regression you can also consider the following projects:

Spearmint - Spearmint Bayesian optimization codebase

yggdrasil-decision-forests - A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

decision-forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.

srbench vs Spearmint higgs-logistic-regression vs Spearmint srbench vs yggdrasil-decision-forests higgs-logistic-regression vs yggdrasil-decision-forests srbench vs decision-forests higgs-logistic-regression vs decision-forests

Compare srbench vs higgs-logistic-regression and see what are their differences.

srbench

higgs-logistic-regression

srbench

higgs-logistic-regression

What are some alternatives?