A step-by-step guide to building an MLOps pipeline

This page summarizes the projects mentioned and recommended in the original post on dev.to

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
www.influxdata.com
featured
  • kitops

    Tools for easing the handoff between AI/ML and App/SRE teams.

    If you've found this post helpful, support us with a GitHub Star!

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • Jenkins

    Jenkins automation server

    MLOps slightly modifies the traditional DevOps CI/CD practice with an additional pipeline called continuous training (CT). The CI/CT/CD pipeline for MLOps involves orchestrating a series of automated steps to streamline the development, training, testing, and deployment of machine learning models. Automating these processes enables efficient model deployment. Standard automation tools include Jenkins, GitLab CI, Travis CI, and GitHub Actions. You will typically set up the MLOps CI/CT/CD pipeline using a trigger for the automation strategy:

  • distribution-spec

    OCI Distribution Specification

    One of the main reasons teams struggle to build and maintain their MLOps pipelines are vendor specific packaging. As a model is handed off between data science teams, app development teams, and SRE/DevOps teams, the teams are required to repackage the model to work with their unique toolset. This is tedious, and stands in contrast to well adopted development processes where teams have standardized on the use of containers to ensure that project definitions, dependencies, and artifacts are shared in a consistent format. KitOps is a robust and flexible tool that addresses these exact shortcomings in the MLOps pipeline. It packages the entire ML project in an OCI-compliant artifact called a ModelKit. It is uniquely designed with flexible development attributes to accommodate ML workflows. They present more convenient processes for ML development than DevOps pipelines. Some of these benefits include:

  • neptune-client

    📘 The experiment tracker for foundation model training

    Experiment tracking tools like MLflow, Weights and Biases, and Neptune.ai provide a pipeline that automatically tracks meta-data and artifacts generated from each experiment you run. Although they have varying features and functionalities, experiment tracking tools provide a systematic structure that handles the iterative model development approach.

  • MLflow

    Open source platform for the machine learning lifecycle

    Experiment tracking tools like MLflow, Weights and Biases, and Neptune.ai provide a pipeline that automatically tracks meta-data and artifacts generated from each experiment you run. Although they have varying features and functionalities, experiment tracking tools provide a systematic structure that handles the iterative model development approach.

  • git-lfs

    Git extension for versioning large files

    The meta-data and model artifacts from experiment tracking can contain large amounts of data, such as the training model files, data files, metrics and logs, visualizations, configuration files, checkpoints, etc. In cases where the experiment tool doesn't support data storage, an alternative option is to track the training and validation data versions per experiment. They use remote data storage systems such as S3 buckets, MINIO, Google Cloud Storage, etc., or data versioning tools like data version control (DVC) or Git LFS (Large File Storage) to version and persist the data. These options facilitate collaboration but have artifact-model traceability, storage costs, and data privacy implications.

  • dvc

    🦉 ML Experiments and Data Management with Git

    The meta-data and model artifacts from experiment tracking can contain large amounts of data, such as the training model files, data files, metrics and logs, visualizations, configuration files, checkpoints, etc. In cases where the experiment tool doesn't support data storage, an alternative option is to track the training and validation data versions per experiment. They use remote data storage systems such as S3 buckets, MINIO, Google Cloud Storage, etc., or data versioning tools like data version control (DVC) or Git LFS (Large File Storage) to version and persist the data. These options facilitate collaboration but have artifact-model traceability, storage costs, and data privacy implications.

  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Apple Native OCR in Rust

    1 project | news.ycombinator.com | 13 Aug 2024
  • AI Tools – A Curated List of Artificial Intelligence Top Tools

    1 project | news.ycombinator.com | 12 Aug 2024
  • Accelerating into AI: Lessons from AWS

    2 projects | dev.to | 12 Jun 2024
  • 10 Open Source Tools for Building MLOps Pipelines

    9 projects | dev.to | 6 Jun 2024
  • Mlflow: Open-source platform for the machine learning lifecycle

    1 project | news.ycombinator.com | 16 May 2024

Did you konow that Python is
the 1st most popular programming language
based on number of metions?