Netflix's Metaflow: Reproducible machine learning pipelines

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • optimo

  • * training: history, comparisons, parameters, hyperparameter tuning with Optuna, Hyperopt or custom optimizer (https://github.com/valohai/optimo); additionally visualizations about training progress and hardware resource monitoring

  • orchest

    Build data pipelines, the easy way 🛠️

  • Others have mentioned some cool projects in this space, but you mentioned self hosted specifically so I’ll share what we’re working on since it might match what you’re looking for.

    As a new project we are still figuring out some of major topics you described.

    In short, we built a data science pipeline tool that should fit well with existing workflows in machine learning and data science. We chose to embrace and integrate open source projects to create a simple and seamless experience with best in breed solutions for various tasks.

    We are particularly happy with our deep integration of JupyterLab building on the Jupyter Enterpise Gateway project from IBM (Codait) for connecting kernels directly to your pipelines. For scheduling we build on top of Celery combined with containerization primitives. For stable and well defined dependency management we built a small environment abstraction on top of Docker. It works really well in our experience!

    Feel free to check out the project on https://github.com/orchest/orchest

    Self hosting should be as easy as running about two lines of code.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • metaflow

    :rocket: Build and manage real-life ML, AI, and data science projects with ease!

  • has anyone done a comparison of ML pipelines from a devops centric perspective ?

    For example, Metaflow doesnt support kubernetes today - https://github.com/Netflix/metaflow/issues/16

    so ultimately the scale up story in most of these management tools is iffy.

    I previously asked about kubeflow here - https://news.ycombinator.com/item?id=24808090 . Seems people think its pretty "horrendous". It seems most of these tools assume a very specialised devops team who will work around the ml tool...rather than the ml tool making this easy.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts