examples vs deep-active-learning

examples

Notebooks demonstrating example applications of the cleanlab library (by cleanlab)

cleanlab HacktoberFest

Source Code

github.com

Suggest alternative

Edit details

deep-active-learning

Deep Active Learning (by ej0cl6)

active-learning deep-active-learning

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

examples		deep-active-learning
	Project
12	Mentions	1
99	Stars	758
-	Growth	-
7.8	Activity	10.0
2 months ago	Latest Commit	over 1 year ago
Jupyter Notebook	Language	Python
GNU Affero General Public License v3.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

examples

Posts with mentions or reviews of examples. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-15.

FLaNK AI - 15 April 2024
7 projects | dev.to | 15 Apr 2024
[R] Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data
1 project | /r/statistics | 30 May 2023

I just published a paper detailing this non-IID check and open-sourced its code in the cleanlab package — just one line of code will check for this and many other types of issues in your dataset.
Datalab: A Linter for ML Datasets
1 project | /r/learnmachinelearning | 16 May 2023

I recently published a blog introducing Datalab and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run Datalab on your own data.
Finetuning Large Language Models -- An introduction to the core ideas and approaches
2 projects | /r/learnmachinelearning | 24 Apr 2023

Cool read! I just finished up a notebook where I show how noisy labels can drastically impact the performance of Open AI LLMs. I first fine-tune the well-known Davinci model (the backbone of ChatGPT) on the original data and report an accuracy of 63%. I then use the open-source package cleanlab to find examples that are incorrectly labeled and drop them from the training data. This step increases the fine-tuning accuracy to 66% (better accuracy with less data). Finally, I correct the mislabeled examples and fine-tuning accuracy jumps to 77%!
What are some active research areas in Machine Learning Systems?
1 project | /r/learnmachinelearning | 27 Mar 2023

The entire field of data-centric AI is an active field that is pretty new --- it focuses on the data side of ML as opposed to just model optimization. Our company is building an open-source package cleanlab that is becoming the DCAI standard.
[Research] ActiveLab: Active Learning with Data Re-Labeling
3 projects | /r/MachineLearning | 2 Mar 2023

I recently published a paper introducing this novel method and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run ActiveLab on your own data. For ML researchers, I’ve made all of our benchmarking code available for reproducibility so you can see for yourself how effective ActiveLab is in practice.
cleanlab open-source --- expanded support for Active Learning and other data-centric AI tasks
1 project | /r/opensource | 2 Mar 2023

suggest which data is most informative to (re)label next (active learning) (link)
Strategies for selecting what data to annotate?
3 projects | /r/computervision | 16 Jan 2023
[D] Can someone point to research on determining usefulness of samples/datasets for training ML models?
1 project | /r/MachineLearning | 12 Jan 2023
cleanlab: an open-source python framework for data-centric AI
2 projects | /r/opensource | 4 Jan 2023

In one-line of python, cleanlab can automatically: 1) find mislabeled data + train robust models 2) detect outliers 3) estimate consensus + annotator-quality for datasets labeled by multiple annotators 4) suggest which data is best to label or re-label next (active learning)

deep-active-learning

Posts with mentions or reviews of deep-active-learning. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-16.

Strategies for selecting what data to annotate?
3 projects | /r/computervision | 16 Jan 2023

What are some alternatives?

When comparing examples and deep-active-learning you can also consider the following projects:

token-label-error-benchmarks - Benchmarking methods for label error detection in token classification tasks

lightly - A python library for self-supervised learning on images.

awesome-active-learning - A curated list of awesome Active Learning

argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

notebooks - Repo for various jupyter notebooks.

cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

multiannotator-benchmarks - Benchmarking algorithms for assessing quality of data labeled by multiple annotators

modAL - A modular active learning framework for Python

adaptive - :chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions

refinery - The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.