MLflow
Airflow
Our great sponsors
- InfluxDB - Access the most powerful time series database as a service
- ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
- Sonar - Write Clean Python Code. Always.
- CodiumAI - TestGPT | Generating meaningful tests for busy devs
MLflow | Airflow | |
---|---|---|
48 | 158 | |
14,441 | 30,323 | |
3.6% | 2.3% | |
9.9 | 10.0 | |
2 days ago | 5 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
MLflow
-
Options for configuration of python libraries - Stack Overflow
In search for a tool that needs comparable configuration I looked into mlflow and found this. https://github.com/mlflow/mlflow/blob/master/mlflow/environment_variables.py There they define a class _EnvironmentVariable and create many objects out of it, for any variable they need. The get method of this class is in principle a decorated os.getenv. Maybe that is something I can take as orientation.
-
[D] Is there a tool to keep track of my ML experiments?
I have been using DVC and MLflow since then DVC had only data tracking and MLflow only model tracking. I can say both are awesome now and maybe the only factor I would like to mention is that IMO, MLflow is a bit harder to learn while DVC is just a git practically.
-
Looking for recommendations to monitor / detect data drifts over time
Dumb question, how does this lib compare to other libs like MLFlow, https://mlflow.org/?
-
Integrating Hugging Face Transformers & DagsHub
While Transformers already includes integration with MLflow, users still have to provide their own MLflow server, either locally or on a Cloud provider. And that can be a bit of a pain.
-
Any MLOps platform you use?
I have an old labmate who uses a similar setup with MLFlow and can endorse it.
MLflow - an open-source platform for managing your ML lifecycle. What’s great is that they also support popular Python libraries like TensorFlow, PyTorch, scikit-learn, and R.
-
Selfhosted chatGPT with local contente
even for people who don't have an ML background there's now a lot of very fully-featured model deployment environments that allow self-hosting (kubeflow has a good self-hosting option, as do mlflow and metaflow), handle most of the complicated stuff involved in just deploying an individual model, and work pretty well off the shelf.
-
ML experiment tracking with DagsHub, MLFlow, and DVC
Here, we’ll implement the experimentation workflow using DagsHub, Google Colab, MLflow, and data version control (DVC). We’ll focus on how to do this without diving deep into the technicalities of building or designing a workbench from scratch. Going that route might increase the complexity involved, especially if you are in the early stages of understanding ML workflows, just working on a small project, or trying to implement a proof of concept.
-
AI in DevOps?
MLflow
-
AWS re:invent 2022 wish list
I am seeing growing demand for MLflow (https://mlflow.org/) and I am seeing a lot of people looking at Databricks as commercial offering for MLflow. Alternatively, some popele are implementing something like Managing your Machine Learning lifecycle with MLflow. Therefore, I think this was on my wish list last year, but I really hope AWS announce a Managed MLFlow Service. I know version 2.X is too new but at least 1.X would be great start.
Airflow
-
Python task scheduler with a web UI
Looks interesting as a light-weight alternative to https://www.prefect.io/ (which itself is a lighter-weight / more modern alternative to https://airflow.apache.org/ ).
-
Working with Managed Workflows for Apache Airflow (MWAA) and Amazon Redshift
You can actually setup and delete new Redshift clusters using Apache Airflow. We can see in the example_dags of a DAG that does a complete setup and delete of a Redshift cluster. There are a few things to think about however.
-
.NET Modern Task Scheduler
A few years ago, I opened a GitHub issue with Microsoft telling them that I think the .NET ecosystem needs its own equivalent of Apache Airflow or Prefect. Fast forward 'til now, and I still don't think we have anything close to these frameworks.
-
How do you decide when to keep a project in a single python file vs break it up into multiple files?
Check out taskinstance.py in the Airflow project, it's a well targeted file, it has only one main class TaskInstance and a few small supporting classes and functions. It is ~3000 lines long: https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py
-
How do you backup running systems?
If you have the spare capacity Apache Airflow is great for this.
-
Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset
💡 You can read more here.
-
How do you manage scheduled tasks?
Its a bit overkill but i use Airflow with local executor.
-
Twitter Data Pipeline with Apache Airflow + MinIO (S3 compatible Object Storage)
To learn more about it, I built a Data Pipeline that uses Apache Airflow to pull Elon Musk tweets using the Twitter API and store the result in a CSV stored in a MinIO (OSS alternative to AWS s3) Object Storage bucket.
-
Data Analytics at Potloc I: Making data integrity your priority with Elementary & Meltano
Airflow
- self hosted Alternative to easycron.com?
What are some alternatives?
clearml - ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
Kedro - A Python framework for creating reproducible, maintainable and modular data science code.
dagster - An orchestration platform for the development, production, and observation of data assets.
Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
zenml - ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Dask - Parallel computing with task scheduling
guildai - Experiment tracking, ML developer tools
airbyte - Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more