[D] Andrew Ng's data-centric vs model-centric Machine Learning

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

nestedcvtraining

6 27 0.0 Python

Once you have your pipeline, model included, with all the transformers defined and parametrized, you could use an optimizing approach like the one in the examples of this library: https://github.com/JaimeArboleda/nestedcvtraining Do you think it will be a good idea? Or am I oversimplifying?

cleanlab

5 2,254 8.4 Python

Discontinued The standard package for machine learning with noisy labels and finding mislabeled data. Works with most datasets and models. [Moved to: https://github.com/cleanlab/cleanlab] (by cgnorthcutt)

I am an author on this, so I am biased. Around half a decade ago, we began developing a field at MIT called confident learning [ paper | blog | reddit post ] that takes a data-centric approach: instead of improving the model quality, it improves the data label quality. It's used by Google, Facebook, and is open-sourced in Python as the cleanlab package.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project