SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Mlops Projects
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
wandb
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
BentoML
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
-
clearml
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
-
courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) (by SkalskiP)
-
superduperdb
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
-
FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
-
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.
Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.
Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).
Project mention: Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript | dev.to | 2024-04-26Taipy is an open-source Python library that enables data scientists and developers to build robust end-to-end data pipelines.
Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.
Link to GitHub -->
Project mention: If you are looking for free courses about AI, LLMs, CV, or NLP, I created the repository with links to resources that I found super high quality and helpful. The link is in the comment. | /r/ChatGPT | 2023-07-02I found it: https://github.com/SkalskiP/courses
Project mention: [Experiment] The future of AI is open-source, and here is the plan | /r/samkoesnadi | 2023-06-05FedML https://github.com/FedML-AI/FedML might already provide a lot of tools to do the job
Project mention: User-friendly PyTorch Lightning and Hydra template for ML experimentation | news.ycombinator.com | 2024-02-05
Lots of good projects on https://github.com/kelvins/awesome-mlops too
Python Mlops related posts
-
AI leaderboards are no longer useful. It's time to switch to Pareto curves
-
Building an Email Assistant Application with Burr
-
Show HN: Evaluate LLM-based RAG Applications with automated test set generation
-
VLLM Sacrifices Accuracy for Speed
-
Detect, Defend, Prevail: Payments Fraud Detection using ML & Deepchecks
-
Easy, fast, and cheap LLM serving for everyone
-
Introduction to NannyML: Model Evaluation without labels
-
A note from our sponsor - SaaSHub
www.saashub.com | 1 Jun 2024
Index
What are some of the best open-source Mlops projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Airflow | 34,877 |
2 | jina | 20,235 |
3 | vllm | 20,017 |
4 | nni | 13,819 |
5 | dagster | 10,468 |
6 | great_expectations | 9,567 |
7 | Kedro | 9,409 |
8 | Taipy | 9,282 |
9 | wandb | 8,354 |
10 | deeplake | 7,799 |
11 | metaflow | 7,688 |
12 | BentoML | 6,650 |
13 | feast | 5,312 |
14 | clearml | 5,324 |
15 | aim | 4,865 |
16 | courses | 4,618 |
17 | superduperdb | 4,433 |
18 | FedML | 4,082 |
19 | lightning-hydra-template | 3,742 |
20 | zenml | 3,703 |
21 | awesome-mlops | 3,654 |
22 | polyaxon | 3,497 |
23 | pipelines | 3,464 |
Sponsored