eurybia
nannyml
eurybia | nannyml | |
---|---|---|
3 | 7 | |
203 | 1,756 | |
1.5% | 1.0% | |
5.1 | 8.6 | |
about 1 month ago | 3 days ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
eurybia
-
State of the Art data drift libraries on Python?
Try out eurybia, from the author of shapash which is a brilliant library as well.
-
Providing ML team with data: normalized or denormalized?
Your data scientists will cook up ugly bits of code to prepare their training data, you'll probably have to rewrite that when they want to ship to prod and also detect and handle discrepancies. In that regard, it sounds like you may enjoy Eurybia to communicate about this data with your data scientists. We made it precisely for that.
-
Advice on a Data Quality framework
So we just trained a model to try and do the same, and then sort of read its entrails through Shapash. The more it can tell the difference, the more your data has changed. We can know which variable has changed the most, and how much it's important to our models. If all else fails (and also if all else works), we can still know (again, this is all quantified in some way, we need numbers, not eyeballings) how much our models predictions have evolved over time, independantly of particular data changes, legit or not. How can our models predictions change if the data is all clean, you ask ? I mean I asked, but you would have too, in my shoes. What lies beyond data engineering ? What is the meaning of life ? The answer is concept drift, and that's where we're starting to work on now that we have a good grasp on data drift. Anyways, the tool is Eurybia. If any part of my ramblings resemble some of your work, please give it a try and chat us up here or through the repo, we are of course very eager to get feedbacks and possibly even contributions, who knows. See ya !
nannyml
-
Introduction to NannyML: Model Evaluation without labels
In order to try to solve this issue, NannyML was created. NannyML is an open-source Python library designed in order to make it easy to monitor drift in the distributions of our model input variables and estimate our model performance (even without labels!) thanks to the Confidence-Based Performance Estimation algorithm they developed. But first of all, why do models need to be monitored and why their performance might vary over time?
- Detecting silent model failure. NannyML estimates performance for regression and classification models using tabular data. It alerts you when and why it changed. It is the only open-source library capable of fully capturing the impact of data drift on performance.
-
[D] Data drift is not a good indicator of model performance degradation
But I may have it haha. What we propose in the blog post instead of relying solely on data drift is using performance estimation methods (eg: https://github.com/NannyML) with them you can estimate the performance of the ml model without having access to ground truth.
-
[HIRING][Full Time, Part Time, Temporary, Internship, Freelance] Data Science Intern (Remote)
Description NannyML - creators of an Open Source Python library, are looking for multiple Data Science interns to help across research, prototyping, and product. Github: https://github.com/NannyML/nannyml About Us NannyML is an Open Source Python lib …
-
What do you think about Detecting Silent ML Failure with an Open Source Python library?
If you think this could add value to your daily life, check it out here: https://github.com/NannyML/nannyml.
-
Can I estimate the impact of data drift on performance?
I found it implemented here: https://github.com/NannyML/nannyml
- Show HN: OSS Python library for detecting silent ML model failure
What are some alternatives?
shapash - 🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
evidently - Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
cuttle-cli - Cuttle automates the transformation of your Python notebook into deployment-ready projects (API, ML pipeline, or just a Python script)
TensorFlow-Examples - TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
deep-significance - Enabling easy statistical significance testing for deep neural networks.
Made-With-ML - Learn how to design, develop, deploy and iterate on production-grade ML applications.
barfi - Python Flow Based Programming environment that provides a graphical programming environment.
ML-For-Beginners - 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
cyclops - Toolkit for health AI implementation
deepchecks - Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.