nestedcvtraining
cleanlab
nestedcvtraining | cleanlab | |
---|---|---|
6 | 5 | |
27 | 2,254 | |
- | - | |
0.0 | 8.4 | |
over 1 year ago | almost 3 years ago | |
Python | Python | |
MIT License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nestedcvtraining
- [P] Nested Cross Validation Library
- Project: Nested Cross Validation Library
-
[D] Andrew Ng's data-centric vs model-centric Machine Learning
Once you have your pipeline, model included, with all the transformers defined and parametrized, you could use an optimizing approach like the one in the examples of this library: https://github.com/JaimeArboleda/nestedcvtraining Do you think it will be a good idea? Or am I oversimplifying?
- [D] What’s the simplest, most lightweight but complete and 100% open source MLOps toolkit?
- [P] New library for performing nested cross validation, optimizing, calibrating and reporting quality of binary classification models
cleanlab
-
[P] Confident Learning making ML QA 34x cheaper
Code for https://arxiv.org/abs/1911.00068 found: https://github.com/cgnorthcutt/cleanlab
-
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
Code: https://github.com/cgnorthcutt/cleanlab
-
[D] Andrew Ng's data-centric vs model-centric Machine Learning
I am an author on this, so I am biased. Around half a decade ago, we began developing a field at MIT called confident learning [ paper | blog | reddit post ] that takes a data-centric approach: instead of improving the model quality, it improves the data label quality. It's used by Google, Facebook, and is open-sourced in Python as the cleanlab package.
-
[R] Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
👍An easy first step to find label errors in datasets is cleanlab: https://github.com/cgnorthcutt/cleanlab
What are some alternatives?
Python Packages Project Generator - 🚀 Your next Python package needs a bleeding-edge project structure.
zeroshot_topics - Topic Inference with Zeroshot models
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
karateclub - Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
projects - Sample projects using Ploomber.
SSL4MIS - Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.
summer - A compartmental disease modelling framework (Python)
clearml - ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
speech-enhancement - Experiments with speech enhancement
NumPy - The fundamental package for scientific computing with Python.
keepsake - Version control for machine learning
cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.