Why do tree-based models still outperform deep learning on tabular data? (2022)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • MLBenchmarks.jl

    ML models benchmarks on public dataset

  • There seems to be differentiable tree models now that perfor somewhat better than e.g. XGBoost https://github.com/Evovest/MLBenchmarks.jl?tab=readme-ov-fil...

  • playground

    Play with neural networks!

  • Not the parent, but NNs typically work better when you can't linearize your data. For classification, that means a space in which hyperplanes separate classes, and for regression a space in which a linear approximation is good.

    For example, take the circle dataset here: https://playground.tensorflow.org

    That doesn't look immediately linearly separable, but since it is 2D we have the insight that parameterizing by radius would do the trick. Now try doing that in 1000 dimensions. Sometimes you can, sometimes you can't or do want to bother.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • yggdrasil-decision-forests

    A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

  • Is it this library https://github.com/google/yggdrasil-decision-forests ?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts