Our great sponsors
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Cleanlab ([https://github.com/cleanlab/cleanlab](https://github.com/cle...) is a family of algorithms for automatically finding issues in datasets. It might seem surprising that it’s possible to automatically identify label errors and out-of-distribution data; Cleanlab does this using the algorithms published in [https://arxiv.org/abs/1911.00068](https://arxiv.org/abs/1911....
Cleanlab’s algorithms, while clever, are actually relatively simple. To help myself (and others!) build intuition for how they work, I built Vizzy, an interactive demo that runs in the browser. Vizzy lets you experiment with an example dataset, tweak the labels, and run Cleanlab to automatically find issues like label errors and out-of-distribution data
Vizzy includes a JavaScript port of (a part of) cleanlab, along with other neat technical nuggets including ML model training in the browser (using features from a pretrained ResNet-18, performing truncated SVD, and using an SVM model for speed). If you’re interested in the details of how Vizzy works, check out this blog post: [https://cleanlab.ai/blog/cleanlab-vizzy/](https://cleanlab.a...
I’m happy to answer any questions related to Vizzy, cleanlab, or confident learning and data-centric AI in general!
Related posts
- Show HN: Simple (but clever) algorithms can find label issues in datasets
- [D] In which ML field can I make significant contribution without significant compute?
- [D] A simple trick to quickly verify data
- [P] Cleanlab Vizzy — learn how to automatically find label errors and out-of-distribution data
- [D] How to deal with badly labelled data?