-
cleanlab
Discontinued The standard package for machine learning with noisy labels and finding mislabeled data. Works with most datasets and models. [Moved to: https://github.com/cleanlab/cleanlab] (by cgnorthcutt)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The important takeaway is that you should clean and correct your test set before you benchmark models on it (e.g., by using cleanlab + some kind of human validation as described in this work). Try to avoid assuming your test set is error free (as has been done in 10,000+ cite papers benchmarking on datasets like MNIST and CIFAR-100).
Related posts
-
Show HN: Simple (but clever) algorithms can find label issues in datasets
-
[D] In which ML field can I make significant contribution without significant compute?
-
[D] A simple trick to quickly verify data
-
[P] Cleanlab Vizzy — learn how to automatically find label errors and out-of-distribution data
-
Show HN: Cleanlab Vizzy – automatically find label errors and bad data