Netflix's Metaflow: Reproducible machine learning pipelines

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Scout APM - Less time debugging, more time building
  • SonarLint - Deliver Cleaner and Safer Code - Right in Your IDE of Choice!
  • SaaSHub - Software Alternatives and Reviews
  • optimo

    * training: history, comparisons, parameters, hyperparameter tuning with Optuna, Hyperopt or custom optimizer (https://github.com/valohai/optimo); additionally visualizations about training progress and hardware resource monitoring

  • orchest

    Build data pipelines, the easy way 🛠️

    Others have mentioned some cool projects in this space, but you mentioned self hosted specifically so I’ll share what we’re working on since it might match what you’re looking for.

    As a new project we are still figuring out some of major topics you described.

    In short, we built a data science pipeline tool that should fit well with existing workflows in machine learning and data science. We chose to embrace and integrate open source projects to create a simple and seamless experience with best in breed solutions for various tasks.

    We are particularly happy with our deep integration of JupyterLab building on the Jupyter Enterpise Gateway project from IBM (Codait) for connecting kernels directly to your pipelines. For scheduling we build on top of Celery combined with containerization primitives. For stable and well defined dependency management we built a small environment abstraction on top of Docker. It works really well in our experience!

    Feel free to check out the project on https://github.com/orchest/orchest

    Self hosting should be as easy as running about two lines of code.

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • metaflow

    :rocket: Build and manage real-life data science projects with ease!

    has anyone done a comparison of ML pipelines from a devops centric perspective ?

    For example, Metaflow doesnt support kubernetes today - https://github.com/Netflix/metaflow/issues/16

    so ultimately the scale up story in most of these management tools is iffy.

    I previously asked about kubeflow here - https://news.ycombinator.com/item?id=24808090 . Seems people think its pretty "horrendous". It seems most of these tools assume a very specialised devops team who will work around the ml tool...rather than the ml tool making this easy.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts