examples
Notebooks demonstrating example applications of the cleanlab library (by cleanlab)
FLaNK-python-processors
Many processors (by tspannhw)
examples | FLaNK-python-processors | |
---|---|---|
12 | 12 | |
99 | 4 | |
- | - | |
7.8 | 7.1 | |
2 months ago | 28 days ago | |
Jupyter Notebook | Python | |
GNU Affero General Public License v3.0 | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
examples
Posts with mentions or reviews of examples.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-04-15.
- FLaNK AI - 15 April 2024
-
[R] Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data
I just published a paper detailing this non-IID check and open-sourced its code in the cleanlab package — just one line of code will check for this and many other types of issues in your dataset.
-
Datalab: A Linter for ML Datasets
I recently published a blog introducing Datalab and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run Datalab on your own data.
-
Finetuning Large Language Models -- An introduction to the core ideas and approaches
Cool read! I just finished up a notebook where I show how noisy labels can drastically impact the performance of Open AI LLMs. I first fine-tune the well-known Davinci model (the backbone of ChatGPT) on the original data and report an accuracy of 63%. I then use the open-source package cleanlab to find examples that are incorrectly labeled and drop them from the training data. This step increases the fine-tuning accuracy to 66% (better accuracy with less data). Finally, I correct the mislabeled examples and fine-tuning accuracy jumps to 77%!
-
What are some active research areas in Machine Learning Systems?
The entire field of data-centric AI is an active field that is pretty new --- it focuses on the data side of ML as opposed to just model optimization. Our company is building an open-source package cleanlab that is becoming the DCAI standard.
-
[Research] ActiveLab: Active Learning with Data Re-Labeling
I recently published a paper introducing this novel method and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run ActiveLab on your own data. For ML researchers, I’ve made all of our benchmarking code available for reproducibility so you can see for yourself how effective ActiveLab is in practice.
-
cleanlab open-source --- expanded support for Active Learning and other data-centric AI tasks
suggest which data is most informative to (re)label next (active learning) (link)
- Strategies for selecting what data to annotate?
- [D] Can someone point to research on determining usefulness of samples/datasets for training ML models?
-
cleanlab: an open-source python framework for data-centric AI
In one-line of python, cleanlab can automatically: 1) find mislabeled data + train robust models 2) detect outliers 3) estimate consensus + annotator-quality for datasets labeled by multiple annotators 4) suggest which data is best to label or re-label next (active learning)
FLaNK-python-processors
Posts with mentions or reviews of FLaNK-python-processors.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-05-06.