Top 23 Python Mlops Projects

Airflow

169 34,397 10.0 Python

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12

Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.

jina

126 19,884 9.2 Python

☁️ Build multimodal AI applications with cloud-native stack

Project mention: Jina.ai: Self-host Multimodal models | news.ycombinator.com | 2024-01-26

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
vllm

30 17,656 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

Project mention: Mistral AI Launches New 8x22B Moe Model | news.ycombinator.com | 2024-04-09

The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)

nni

5 13,708 6.7 Python

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
dagster

46 10,173 10.0 Python

An orchestration platform for the development, production, and observation of data assets.

Project mention: Experience with Dagster.io? | news.ycombinator.com | 2023-07-25

great_expectations

15 9,440 9.9 Python

Always know what to expect from your data.

Project mention: Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR | dev.to | 2023-04-24

Great Expectations (GE) is an open-source data validation tool that helps ensure data quality.

Kedro

29 9,341 9.7 Python

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Taipy

15 8,257 9.9 Python

Turns Data and AI algorithms into production-ready web applications in no time.

Project mention: +10 Resources to Empower Women in Technology | dev.to | 2024-03-06

I’ve been working in tech for more than five years. I started as a Data Scientist, and now I’m exploring and loving the DevRel 🥑 role for Taipy. Needless to say, evolving in the tech scene has been a ride full of ups, downs, and everything in between.

wandb

16 8,159 9.8 Python

🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05

Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.

deeplake

13 7,690 9.8 Python

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Project mention: FLaNK AI Weekly 25 March 2025 | dev.to | 2024-03-25

metaflow

24 7,559 9.2 Python

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

BentoML

16 6,521 9.8 Python

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

Project mention: Who's hiring developer advocates? (December 2023) | dev.to | 2023-12-04

Link to GitHub -->

feast

8 5,246 9.3 Python

Feature Store for Machine Learning

Project mention: What's Happening with Feast? | news.ycombinator.com | 2023-12-07

clearml

20 5,217 8.1 Python

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Project mention: FLaNK Stack Weekly 12 February 2024 | dev.to | 2024-02-12

aim

70 4,762 7.9 Python

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

Project mention: aim VS cascade - a user suggested alternative | libhunt.com/r/aim | 2023-12-05

courses

7 4,436 6.4 Python

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) (by SkalskiP)

Project mention: If you are looking for free courses about AI, LLMs, CV, or NLP, I created the repository with links to resources that I found super high quality and helpful. The link is in the comment. | /r/ChatGPT | 2023-07-02

I found it: https://github.com/SkalskiP/courses

superduperdb

24 4,327 9.9 Python

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Project mention: FLaNK Stack Weekly 12 February 2024 | dev.to | 2024-02-12

FedML

6 4,052 9.9 Python

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://fedml.ai) is your generative AI platform at scale.

Project mention: [Experiment] The future of AI is open-source, and here is the plan | /r/samkoesnadi | 2023-06-05

FedML https://github.com/FedML-AI/FedML might already provide a lot of tools to do the job

lightning-hydra-template

9 3,645 5.1 Python

PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡

Project mention: User-friendly PyTorch Lightning and Hydra template for ML experimentation | news.ycombinator.com | 2024-02-05

zenml

33 3,638 9.8 Python

ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.

Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01

awesome-mlops

7 3,555 6.8 Python

:sunglasses: A curated list of awesome MLOps tools (by kelvins)

Project mention: Choosing an Orchestrator in a green-field setup | /r/mlops | 2023-12-07

Lots of good projects on https://github.com/kelvins/awesome-mlops too

polyaxon

9 3,476 8.8 Python

MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle
pipelines

2 3,436 9.8 Python

Machine Learning Pipelines for Kubeflow
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-09.

Python Mlops related posts

Show HN: Evaluate LLM-based RAG Applications with automated test set generation
1 project | news.ycombinator.com | 11 Apr 2024
VLLM Sacrifices Accuracy for Speed
1 project | news.ycombinator.com | 23 Jan 2024
Detect, Defend, Prevail: Payments Fraud Detection using ML & Deepchecks
1 project | dev.to | 13 Jan 2024
Easy, fast, and cheap LLM serving for everyone
1 project | news.ycombinator.com | 17 Dec 2023
Introduction to NannyML: Model Evaluation without labels
1 project | dev.to | 15 Dec 2023
vllm
1 project | news.ycombinator.com | 15 Dec 2023
Mixtral Expert Parallelism
1 project | news.ycombinator.com | 15 Dec 2023
A note from our sponsor - SaaSHub
www.saashub.com | 19 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Mlops projects in Python? This list will help you:

	Project	Stars
1	Airflow	34,397
2	jina	19,884
3	vllm	17,656
4	nni	13,708
5	dagster	10,173
6	great_expectations	9,440
7	Kedro	9,341
8	Taipy	8,257
9	wandb	8,159
10	deeplake	7,690
11	metaflow	7,559
12	BentoML	6,521
13	feast	5,246
14	clearml	5,217
15	aim	4,762
16	courses	4,436
17	superduperdb	4,327
18	FedML	4,052
19	lightning-hydra-template	3,645
20	zenml	3,638
21	awesome-mlops	3,555
22	polyaxon	3,476
23	pipelines	3,436