[D] What’s the simplest, most lightweight but complete and 100% open source MLOps toolkit?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • nestedcvtraining

  • summer

    A compartmental disease modelling framework (Python) (by monash-emu)

  • I'm starting to document a Python library with Sphinx and hosting it on a static site (eg. AWS S3, Netlify, Clouflare pages). Most of the docs are markdown with the examples being Jupyter notebooks. Docs are built and deployed to summerepi.com on commits to master. A little bit fiddly to set up but once it's going it's pretty magical. Source code is here.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • clearml

    ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

  • ploomber

    The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

  • You need a couple of tools to cover the entire ML lifecycle. For developing and deploying your pipelines, check out Ploomber (disclaimer: I'm the author):

  • projects

    Sample projects using Ploomber. (by ploomber)

  • You can convert a training pipeline into an online service easily (this is great to prevent training-serving skew). Here's an example project

  • keepsake

    Version control for machine learning

  • Complementing the given answer, you could check https://github.com/replicate/keepsake for model versioning.

  • speech-enhancement

    Experiments with speech enhancement (by MattSegal)

  • Even if detailed unit testing is hard, you can smoke test your models in CI to make sure that they're at least not crashing. More on smoke tests here. Some example smoke tests for a neural net here. Running your tests in GitHub Actions is relatively easy (here).

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • NumPy

    The fundamental package for scientific computing with Python.

  • Napoleon is a Sphinx extension that enables Sphinx to parse both NumPy and Google style docstrings - the style recommended by Khan Academy.

  • Python Packages Project Generator

    Discontinued 🚀 Your next Python package needs a bleeding-edge project structure.

  • CookieCutter or Kedro are the winners. I still think we will stick to Kedro template, because it offers extra functionality, and I like to think of each project as a set of pipelines to be run. Anyway, some cookiecutter templates are very good, like this one. In case we use both Kedro and ClearML, we'll have to figure out how to integrate its pipelines with ClearML tasks. But in the slack channel of ClearML there are other teams doing the same, so at least it's possible.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts