cleanlab
karateclub
cleanlab | karateclub | |
---|---|---|
5 | 1 | |
2,254 | 2,089 | |
- | - | |
8.4 | 7.0 | |
almost 3 years ago | 2 months ago | |
Python | Python | |
GNU Affero General Public License v3.0 | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
cleanlab
-
[P] Confident Learning making ML QA 34x cheaper
Code for https://arxiv.org/abs/1911.00068 found: https://github.com/cgnorthcutt/cleanlab
-
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
Code: https://github.com/cgnorthcutt/cleanlab
-
[D] Andrew Ng's data-centric vs model-centric Machine Learning
I am an author on this, so I am biased. Around half a decade ago, we began developing a field at MIT called confident learning [ paper | blog | reddit post ] that takes a data-centric approach: instead of improving the model quality, it improves the data label quality. It's used by Google, Facebook, and is open-sourced in Python as the cleanlab package.
-
[R] Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
đź‘ŤAn easy first step to find label errors in datasets is cleanlab: https://github.com/cgnorthcutt/cleanlab
karateclub
-
Embedding attributed graphs
Check out Karate Club (https://github.com/benedekrozemberczki/karateclub) . It has implementations for many attributed node embedding algorithms.
What are some alternatives?
zeroshot_topics - Topic Inference with Zeroshot models
tensorflow - An Open Source Machine Learning Framework for Everyone
SSL4MIS - Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.
Keras - Deep Learning for humans
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
gym - A toolkit for developing and comparing reinforcement learning algorithms.
scikit-learn - scikit-learn: machine learning in Python
gensim - Topic Modelling for Humans
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Surprise - A Python scikit for building and analyzing recommender systems
openskill.py - Multiplayer Rating System. No Friction.
TFLearn - Deep learning library featuring a higher-level API for TensorFlow.