Show HN: Cleanlab Vizzy – automatically find label errors and bad data

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Cleanlab ([https://github.com/cleanlab/cleanlab](https://github.com/cle...) is a family of algorithms for automatically finding issues in datasets. It might seem surprising that it’s possible to automatically identify label errors and out-of-distribution data; Cleanlab does this using the algorithms published in [https://arxiv.org/abs/1911.00068](https://arxiv.org/abs/1911....

    Cleanlab’s algorithms, while clever, are actually relatively simple. To help myself (and others!) build intuition for how they work, I built Vizzy, an interactive demo that runs in the browser. Vizzy lets you experiment with an example dataset, tweak the labels, and run Cleanlab to automatically find issues like label errors and out-of-distribution data

    Vizzy includes a JavaScript port of (a part of) cleanlab, along with other neat technical nuggets including ML model training in the browser (using features from a pretrained ResNet-18, performing truncated SVD, and using an SVM model for speed). If you’re interested in the details of how Vizzy works, check out this blog post: [https://cleanlab.ai/blog/cleanlab-vizzy/](https://cleanlab.a...

    I’m happy to answer any questions related to Vizzy, cleanlab, or confident learning and data-centric AI in general!

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts