[N] Fine-Tuning OpenAI Language Models with Noisily Labeled Data (37% error reduction)

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • label-errors

    🛠️ Corrected Test Sets for ImageNet, MNIST, CIFAR, Caltech-256, QuickDraw, IMDB, Amazon Reviews, 20News, and AudioSet

    we be benchmarked the minimum (lower bound) of error detection across the ten most commonly used real world ML datasets and found the lower bound is at least 50% accurate. You can see these errors yourself here: labelerrors.com (all found with cleanlab studio, a more advanced version of the algorithms in confident learning) and this was nominated for best paper award at NeurIPS 2021.

  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    If you have trained a speech-to-text model and are able to get its probabilistic predictions over the word/token at each position, then you can use the token_classification module in our open-source cleanlab library for this purpose.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts