MLflow VS dvc

Compare MLflow vs dvc and see what are their differences.

MLflow

Open source platform for the machine learning lifecycle (by mlflow)

dvc

πŸ¦‰Data Version Control | Git for Data & Models | ML Experiments Management (by iterative)
Our great sponsors
  • Scout APM - Less time debugging, more time building
  • OPS - Build and Run Open Source Unikernels
  • SonarQube - Static code analysis for 29 languages.
MLflow dvc
22 49
11,127 9,168
3.3% 2.7%
9.7 9.8
5 days ago 1 day ago
Python Python
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

MLflow

Posts with mentions or reviews of MLflow. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-01-21.

dvc

Posts with mentions or reviews of dvc. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-01-21.
  • [D] Tips for ML workflow on raw data
    2 projects | reddit.com/r/MachineLearning | 21 Jan 2022
    Try to use a version controls tool for ML such as DVC
  • Git-annex – Managing large files with Git
    2 projects | news.ycombinator.com | 15 Jan 2022
  • IPFS for a shitty cause
    3 projects | reddit.com/r/DataHoarder | 6 Jan 2022
    Also, if anyone has ideas on how to better handle scientific data with IPFS I'd love to dig more into it. I'm fairly interested in getting DVC to work with IPFS: https://github.com/iterative/dvc/discussions/6777. I think git+dvc+ipfs would be a big step forward for fields that intersect with dataset storage / machine learning (which is a lot of them).
  • HPC Rocket - A tool to run Slurm jobs from CI pipelines
    4 projects | reddit.com/r/Python | 3 Jan 2022
    This looks really interesting! I have a similar scenario but haven't looked into it yet. Have you looked at dvc.org - I'm planning on using it together with slurm and what they call CML for my projects. On that context I also wrote a tool that makes DVC more pythonic https://github.com/zincware/ZnTrack altough I'm currently restructuring it a bit but having backwards compatibility in mind.
  • Unstructured Data Governance for ML
    4 projects | reddit.com/r/dataengineering | 31 Dec 2021
    DVC: https://dvc.org/
  • Pre-commit: framework for managing/maintaining multi-language pre-commit hooks
    9 projects | news.ycombinator.com | 20 Dec 2021
    Here's our setup, which is the result of several iterations and ergonomics refinements. Note: our stack is 90% python, with TS for frontend. Also 95% devs use mac (there's one data scientist on windows, he uses WSL).

    We install enough utilities with `brew` to get pyenv working, use that to build all python versions. Then iirc `brew install pipx`, maybe it's `pip3 install --user pipx`. Anyway, that's the only python library binary installed outside a venv.

    Pipx installs isort, black, dvc, and pre-commit.

    Every repo has a Makefile. This drives all the common operations. Pyproject.toml (/eslint.json?) set the config for isort and black (or eslint). `make format` runs isort and black on python, eslint on js. `make lint` just verifies.

    Pre-commit only runs the lint, it doesn't format. It also runs some scripts to ensure you aren't accidentally committing large files. Pre-commit also runs several DVC actions (the default dvc hooks) on commit, push, and checkout. These run in a venv managed by pre-commit. We just pin the version.

    Github actions has a dedicated lint.yaml which runs a python linter action. We use the black version here to define which black pipx installs. We use `act` if we wanna see how an action runs without sending a commit just to trigger jobs.

    As an aside, I'm still fiddling with the dvc `pre-commit` post-checkout hooks. They don't always pull the files when they ought to.

    Most of the actual unit/integration tests run in containers, but they can run in a venv with the same logic, thanks to makefile. We use a dvc action to sync files in CI.

    So yeah there's technically 2 copies of black and dvc, but we just use pinning. In practice, we've only had one issue with discrepancies in behavior locally vs CI, which was local black not catching a rule to avoid ''' for docstrings; using """ fixed it. On the whole, pre-commit saves against a lot of annoying goofs, but CI system is law, so we largely harmonize against that.

    IMHO, this is the least egregious "double accounting" we have in local vs staging ci vs production ci (I lost that battle, manager would rather keep staing.yaml and production.yaml, rather than parameterize. Shrug.gif).

    Technologies referenced:

    https://dvc.org/

    https://github.com/iterative/setup-dvc

    https://github.com/marketplace/actions/python-linter

    https://github.com/nektos/act

  • Running Collaborative Machine Learning Experiments with DVC and Git - Tutorial
    1 project | reddit.com/r/GitOps | 13 Dec 2021
    The following tutorial explains how you can bundle your data and code changes for each ML experiment and push those to a remote for your team to check out using DVC and Git: Running Collaborative ML Experiments
  • Don't Just Track Your ML Experiments, Version Them - Managing Machine Learning Experiments as Code with Git and DVC Open Source Tools
    1 project | reddit.com/r/opensource | 10 Dec 2021
    The following guide is explaining how ML experiment versioning with DVC (Data Version Control) open source tools brings together the benefits of traditional code versioning and modern day experiment tracking: Don't Just Track Your ML Experiments, Version Them
  • Managing Your Machine Learning Experiments as Code with Git and DVC
    1 project | reddit.com/r/github | 9 Dec 2021
    Experiment versioning treats experiments as code. It saves all metrics, hyperparameters, and artifact information in text files that can be versioned by Git, which becomes a store for experiment meta-information. The article above shows how with DVC tool, you can push experiments just like Git branches, giving you flexibility to share experiment you choose.
  • DVC (DataVersionControl) - Managing Machine Learning Experiments as Code with Git and DVC
    1 project | reddit.com/r/githubprojects | 9 Dec 2021

What are some alternatives?

When comparing MLflow and dvc you can also consider the following projects:

Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

clearml - ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

zenml - ZenML πŸ™: MLOps framework to create reproducible pipelines.

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

tensorflow - An Open Source Machine Learning Framework for Everyone

guildai - Experiment tracking, ML developer tools

neptune-client - Neptune client library - integrate your Python scripts with Neptune

gensim - Topic Modelling for Humans

Activeloop Hub - Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai

scikit-learn - scikit-learn: machine learning in Python

onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator