tests-as-linear
MLflow
Our great sponsors
tests-as-linear | MLflow | |
---|---|---|
26 | 54 | |
472 | 17,234 | |
- | 2.4% | |
0.0 | 9.9 | |
2 months ago | 3 days ago | |
JavaScript | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tests-as-linear
- Common statistical tests are linear models (or: how to teach stats)
-
Everything Is a Linear Model
I knew the linked-in-the-article https://lindeloev.github.io/tests-as-linear/ which is also great. A bit meta on the widespread use of linear models: "Transcending General Linear Reality" by Andrew Abbott, DOI:10.2307/202114
-
Bayesians Moving from Defense to Offense
Maybe you would find it useful to read a textbook on bayesian stats for inspiration. I can recommend Richard McElreath's "Statistical Rethinking" which makes it very clear how inflexible it is to just know recipes like t-tests or anovas.
The canonical approach is to build a generative model with a parameter (or multiple for ~anova) that codes for the difference between groups and do inference on that parameter of interest. Most of the recipes taught in statistics classes can be modelled as a regression of some kind (this counts for frequentist stats too, see https://lindeloev.github.io/tests-as-linear/ ). Some advocate to do that inference with bayes factors. Others, like discussed elsewhere in this thread, advocate combining the resulting posterior with a cost/value function, but either way the lesson is that there is less focus on "t-test-vs-anova" because they're the same thing anyways.
- How to cheat stats: common statistical tests are linear models
-
Introduction to Modern Statistics
I understand where you're coming from, and I like the idea for a certain kind of people: those who are very good at handling abstractions. Software engineers do have this skill, but the majority of statistics users do not. Trying to explain the similarities between these linear methods and how all is one [1] to a social scientist who doesn't like numbers nor formulas to begin with would only lead to more confusion.
But if you ever do a randomized test with a suitable linear model to estimate the efficacy of these two methods, do let us know, that would be 10/10 :)
[1]: https://lindeloev.github.io/tests-as-linear/#41_one_sample_t...
- [Statistics and Probability] Common statistical tests are linear models (or: how to teach stats)
-
[Q] Critique of a flowchart I made?
My main critique is that these classical tests are often better explained and introduced in the concept of a regression framework. The fact that you even need a flowchart demonstrates how confusing and unintuitive the classical approach to teaching statistics is. If you learn regression, everything else becomes a special case of this much more expressive way of thinking about how to measure variation. This point is made convincingly in this post: https://lindeloev.github.io/tests-as-linear/
-
[Q] Two questions concerning the relationship between non-parametric tools and normal distribution
Most parametric tests donโt assume normality. If you feel that assuming normality is not viable, you are free to choose any other distribution. This may not be immediately obvious, since most intro courses teach inference as a bunch of disjointed formulas, but it will make more sense once one learns about generalized linear models framework and realizes that common statistical tests are all linear models. There is no need to jump straight for nonparametric tests just because something isnโt normal, as cool as they are. (Also a pedantic nitpick: Mann-Whitney and Co. test difference in average ranks, not difference in means. So they are not really a nonparametric equivalent to T tests).
-
Use lm function for hypothesis test comparing two means
I think this is what you are looking for: https://lindeloev.github.io/tests-as-linear/
MLflow
-
Exploring Open-Source Alternatives to Landing AI for Robust MLOps
Platforms such as MLflow monitor the development stages of machine learning models. In parallel, Data Version Control (DVC) brings version control system-like functions to the realm of data sets and models.
-
cascade alternatives - clearml and MLflow
3 projects | 1 Nov 2023
-
EL5: Difference between OpenLLM, LangChain, MLFlow
MLFlow - http://mlflow.org
- Explain me how websites like Dall-E, chatgpt, thispersondoesntexit process the user data so quickly
- [D] What licensed software do you use for machine learning experimentation tracking?
-
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
MLflow:
-
Options for configuration of python libraries - Stack Overflow
In search for a tool that needs comparable configuration I looked into mlflow and found this. https://github.com/mlflow/mlflow/blob/master/mlflow/environment_variables.py There they define a class _EnvironmentVariable and create many objects out of it, for any variable they need. The get method of this class is in principle a decorated os.getenv. Maybe that is something I can take as orientation.
-
[D] Is there a tool to keep track of my ML experiments?
I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.
-
[Q] Is there a tool to keep track of my ML experiments?
Hi, you should have a look at ML flow https://mlflow.org or weight and biases https://wandb.ai/site
-
Looking for recommendations to monitor / detect data drifts over time
Dumb question, how does this lib compare to other libs like MLFlow, https://mlflow.org/?
What are some alternatives?
brms - brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
clearml - ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
handson-ml2 - A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
stan - Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
zenml - ZenML ๐: Build portable, production-ready MLOps pipelines. https://zenml.io.
ims - ๐ Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference. For v1, see https://openintro-ims.netlify.app.
guildai - Experiment tracking, ML developer tools
textbook - The textbook Computational and Inferential Thinking: The Foundations of Data Science
dvc - ๐ฆ ML Experiments and Data Management with Git
tensorflow - An Open Source Machine Learning Framework for Everyone
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.