[D] What’s the simplest, most lightweight but complete and 100% open source MLOps toolkit? -> MY OWN CONCLUSIONS

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Poetry

    Python packaging and dependency management made easy

  • From the poetry github readme, this one snippit explains it well: https://github.com/python-poetry/poetry#dependency-resolution

  • kedro-great

    The easiest way to integrate Kedro and Great Expectations

  • I expected Great Expectations library to be recommended, but nobody told anything. Instead, unit testing and/or smoke tests using pytest. And checking them with Jenkins. Anyway, if Kedro ends up being our project template, I'll keep an eye on the plugin with Great Expectations.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • great_expectations

    Always know what to expect from your data.

  • I expected Great Expectations library to be recommended, but nobody told anything. Instead, unit testing and/or smoke tests using pytest. And checking them with Jenkins. Anyway, if Kedro ends up being our project template, I'll keep an eye on the plugin with Great Expectations.

  • streamlit

    Streamlit — A faster way to build and share data apps.

  • We should take a look at voila and streamlit.

  • fastapi

    FastAPI framework, high performance, easy to learn, fast to code, ready for production

  • FastAPI. Or even simpler: DL4J, to be used in Java when we need to communicate with the rest of the applications in real time.

  • black

    The uncompromising Python code formatter

  • Flake8 (including flake8-docstrings), MyPy and Black are hugely recommended. Google style guide is something to take a look at too.

  • clearml

    ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

  • There are mainly two solutions that are 100% open source and free to install and use, and that may solve most of the requirements of ML practitioners: Hopsworks and ClearML. Among this two, if I had to chose one right now, it will be ClearML. Hopsworks might be much more complete, but ClearML seems to have a bigger community behind it and to be easier to install and use. So ClearML will be something to take a look at in case we go for an all-in-one package. I also like the idea of having a platform with an UI with all our projects.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • feast

    Feature Store for Machine Learning

  • Have you looked at Feats as a Feature Store solution? It seems promising but I haven't really looked into it yet though.

  • BentoML

    The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

  • I've been using BentoML for deployment/serving and it saved my team and I a lot of time. Highly recommend. The only downside is that it's rather new and things are evolving quickly, so you have to keep an eye out for big/breaking changes.

  • metaflow

    :rocket: Build and manage real-life ML, AI, and data science projects with ease!

  • Metaflow . I love this framework for pipelining.

  • metaflow-on-kubernetes-docs

    Documentation For Running Metaflow on Kubernetes

  • There are community Forks supporting Kubernetes and KFP. But they are not yet a part of the main framework and support is fluctuating. I think support should be available in the future.

  • metaflow

    Build and manage real-life data science projects with ease. (by zillow)

  • There are community Forks supporting Kubernetes and KFP. But they are not yet a part of the main framework and support is fluctuating. I think support should be available in the future.

  • android-bootstrap

    Bootstrap your Lobe machine learning model with our Android project. (by lobe)

  • If you are looking to train vision models for free, I would recommend Lobe

  • dephell

    Discontinued :package: :fire: Python project management. Manage packages: convert between formats, lock, install, resolve, isolate, test, build graph, show outdated, audit. Manage venvs, build package, bump version.

  • Not necessarily. You can use Dephell (https://github.com/dephell/dephell) to convert from poetry to the old-fashioned requirements.txt

  • wave

    Realtime Web Apps and Dashboards for Python and R (by h2oai)

  • I would extend your Visualization part with: - JupyterHub: their deployment script allows you to get started and have a centralized jupyter server for your team very easily. One should not underestimate notebooks as they are the most straightforward tool for data exploration - H2o Wave, the new player in town (currently un pre-alpha). Although being in its early stage, it looks very promising and has a strong potential to overcome limitations of streamit that we have been waiting to be fixed for ever now: session states, logging, deployment, etc. Wave has a more server based approach that makes these problems much easier to deal with.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts