[N] Fine-Tuning OpenAI Language Models with Noisily Labeled Data (37% error reduction)

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

label-errors

7 176 0.0

🛠️ Corrected Test Sets for ImageNet, MNIST, CIFAR, Caltech-256, QuickDraw, IMDB, Amazon Reviews, 20News, and AudioSet

we be benchmarked the minimum (lower bound) of error detection across the ten most commonly used real world ML datasets and found the lower bound is at least 50% accurate. You can see these errors yourself here: labelerrors.com (all found with cleanlab studio, a more advanced version of the algorithms in confident learning) and this was nominated for best paper award at NeurIPS 2021.
cleanlab

69 8,592 9.4 Python

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

If you have trained a speech-to-text model and are able to get its probabilistic predictions over the word/token at each position, then you can use the token_classification module in our open-source cleanlab library for this purpose.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Automated Data Quality at Scale
2 projects | news.ycombinator.com | 27 Jul 2023
[Research] Detecting Errors in Numerical Data via any Regression Model
1 project | /r/statistics | 20 Sep 2023
Detecting Errors in Numerical Data via Any Regression Model
1 project | news.ycombinator.com | 18 Sep 2023
cleanlab v2.5 now supports all major ML tasks (adds regression, object detection, and image segmentation)
1 project | /r/coolgithubprojects | 17 Sep 2023
Enhancing Product Analytics and E-commerce Business
1 project | /r/ecommerce | 6 Jul 2023

[N] Fine-Tuning OpenAI Language Models with Noisily Labeled Data (37% error reduction)

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning label-errors weak-supervision Benchmarking confident-learning
Post date: 3 May 2023

label-errors

cleanlab

InfluxDB

Related posts

[N] Fine-Tuning OpenAI Language Models with Noisily Labeled Data (37% error reduction)

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Machine Learning label-errors weak-supervision Benchmarking confident-learning Post date: 3 May 2023

label-errors

cleanlab

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning label-errors weak-supervision Benchmarking confident-learning
Post date: 3 May 2023