tests-as-linear vs MLflow

tests-as-linear

Common statistical tests are linear models (or: how to teach stats) (by lindeloev)

Suggest topics

Source Code

lindeloev.github.io

Suggest alternative

Edit details

MLflow

Open source platform for the machine learning lifecycle (by mlflow)

Machine Learning AI ML mlflow apache-spark model-management

Source Code

mlflow.org

Docs

Suggest alternative

Edit details

Our great sponsors

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

Our great sponsors

tests-as-linear		MLflow
	Project
26	Mentions	54
472	Stars	17,234
-	Growth	2.4%
0.0	Activity	9.9
2 months ago	Latest Commit	3 days ago
JavaScript	Language	Python
-	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

tests-as-linear

Posts with mentions or reviews of tests-as-linear. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-18.

Common statistical tests are linear models (or: how to teach stats)
1 project | news.ycombinator.com | 9 Apr 2024

1 project | news.ycombinator.com | 18 Feb 2024
Everything Is a Linear Model
2 projects | news.ycombinator.com | 18 Feb 2024

I knew the linked-in-the-article https://lindeloev.github.io/tests-as-linear/ which is also great. A bit meta on the widespread use of linear models: "Transcending General Linear Reality" by Andrew Abbott, DOI:10.2307/202114
Bayesians Moving from Defense to Offense
2 projects | news.ycombinator.com | 25 Dec 2023

Maybe you would find it useful to read a textbook on bayesian stats for inspiration. I can recommend Richard McElreath's "Statistical Rethinking" which makes it very clear how inflexible it is to just know recipes like t-tests or anovas.
The canonical approach is to build a generative model with a parameter (or multiple for ~anova) that codes for the difference between groups and do inference on that parameter of interest. Most of the recipes taught in statistics classes can be modelled as a regression of some kind (this counts for frequentist stats too, see https://lindeloev.github.io/tests-as-linear/ ). Some advocate to do that inference with bayes factors. Others, like discussed elsewhere in this thread, advocate combining the resulting posterior with a cost/value function, but either way the lesson is that there is less focus on "t-test-vs-anova" because they're the same thing anyways.
How to cheat stats: common statistical tests are linear models
1 project | news.ycombinator.com | 17 Oct 2023
Introduction to Modern Statistics
9 projects | news.ycombinator.com | 12 Oct 2023

I understand where you're coming from, and I like the idea for a certain kind of people: those who are very good at handling abstractions. Software engineers do have this skill, but the majority of statistics users do not. Trying to explain the similarities between these linear methods and how all is one [1] to a social scientist who doesn't like numbers nor formulas to begin with would only lead to more confusion.
But if you ever do a randomized test with a suitable linear model to estimate the efficacy of these two methods, do let us know, that would be 10/10 :)
[1]: https://lindeloev.github.io/tests-as-linear/#41_one_sample_t...
[Statistics and Probability] Common statistical tests are linear models (or: how to teach stats)
1 project | /r/michaelaalcorn | 11 Mar 2023
[Q] Critique of a flowchart I made?
1 project | /r/statistics | 31 Jan 2023

My main critique is that these classical tests are often better explained and introduced in the concept of a regression framework. The fact that you even need a flowchart demonstrates how confusing and unintuitive the classical approach to teaching statistics is. If you learn regression, everything else becomes a special case of this much more expressive way of thinking about how to measure variation. This point is made convincingly in this post: https://lindeloev.github.io/tests-as-linear/
[Q] Two questions concerning the relationship between non-parametric tools and normal distribution
1 project | /r/statistics | 20 Dec 2022

Most parametric tests don’t assume normality. If you feel that assuming normality is not viable, you are free to choose any other distribution. This may not be immediately obvious, since most intro courses teach inference as a bunch of disjointed formulas, but it will make more sense once one learns about generalized linear models framework and realizes that common statistical tests are all linear models. There is no need to jump straight for nonparametric tests just because something isn’t normal, as cool as they are. (Also a pedantic nitpick: Mann-Whitney and Co. test difference in average ranks, not difference in means. So they are not really a nonparametric equivalent to T tests).
Use lm function for hypothesis test comparing two means
1 project | /r/rstats | 27 Oct 2022

I think this is what you are looking for: https://lindeloev.github.io/tests-as-linear/

MLflow

Posts with mentions or reviews of MLflow. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-13.

Exploring Open-Source Alternatives to Landing AI for Robust MLOps
18 projects | dev.to | 13 Dec 2023

Platforms such as MLflow monitor the development stages of machine learning models. In parallel, Data Version Control (DVC) brings version control system-like functions to the realm of data sets and models.
cascade alternatives - clearml and MLflow
3 projects | 1 Nov 2023
EL5: Difference between OpenLLM, LangChain, MLFlow
2 projects | /r/LLMDevs | 19 Jun 2023

MLFlow - http://mlflow.org
Explain me how websites like Dall-E, chatgpt, thispersondoesntexit process the user data so quickly
1 project | /r/dataengineering | 17 Jun 2023
[D] What licensed software do you use for machine learning experimentation tracking?
1 project | /r/MachineLearning | 11 Jun 2023
Exploring MLOps Tools and Frameworks: Enhancing Machine Learning Operations
3 projects | dev.to | 6 Jun 2023

MLflow:
Options for configuration of python libraries - Stack Overflow
2 projects | /r/learnpython | 14 May 2023

In search for a tool that needs comparable configuration I looked into mlflow and found this. https://github.com/mlflow/mlflow/blob/master/mlflow/environment_variables.py There they define a class _EnvironmentVariable and create many objects out of it, for any variable they need. The get method of this class is in principle a decorated os.getenv. Maybe that is something I can take as orientation.
[D] Is there a tool to keep track of my ML experiments?
2 projects | /r/MachineLearning | 13 May 2023

I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.
[Q] Is there a tool to keep track of my ML experiments?
1 project | /r/datascience | 13 May 2023

Hi, you should have a look at ML flow https://mlflow.org or weight and biases https://wandb.ai/site
Looking for recommendations to monitor / detect data drifts over time
3 projects | /r/datascience | 15 Apr 2023

Dumb question, how does this lib compare to other libs like MLFlow, https://mlflow.org/?

What are some alternatives?

When comparing tests-as-linear and MLflow you can also consider the following projects:

brms - brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan

clearml - ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

handson-ml2 - A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

stan - Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

zenml - ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.

ims - 📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference. For v1, see https://openintro-ims.netlify.app.

guildai - Experiment tracking, ML developer tools

textbook - The textbook Computational and Inferential Thinking: The Foundations of Data Science

dvc - 🦉 ML Experiments and Data Management with Git

tensorflow - An Open Source Machine Learning Framework for Everyone

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.