Python Mlops

Open-source Python projects categorized as Mlops

Top 23 Python Mlops Projects

  • jina

    🔮 Build multimodal AI services via cloud native technologies · Neural Search · Generative AI · Cloud Native

    Project mention: Image matching within database? [P] | reddit.com/r/MachineLearning | 2023-01-05

    You should check out https://github.com/jina-ai/jina and https://github.com/jina-ai/finetuner

  • nni

    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • label-studio

    Label Studio is a multi-type data labeling and annotation tool with standardized output format

    Project mention: Survey: what open source python tool do you use for manual data labeling? | reddit.com/r/computervision | 2023-01-04

    - labelstudio for drawing and saving the labels (nice UI/UX): https://github.com/heartexlabs/label-studio

  • Kedro

    A Python framework for creating reproducible, maintainable and modular data science code.

    Project mention: Data Science/ Analyst Zertifikate für den Job Markt? | reddit.com/r/de_EDV | 2022-12-13
  • great_expectations

    Always know what to expect from your data.

    Project mention: Soda Core (OSS) is now GA! So, why should you add checks to your data pipelines? | reddit.com/r/dataengineering | 2022-06-28

    GE is arguably the most well known OSS alternative to Soda Core. The third option is deequ, originally developed and released in OSS by AWS. Our community has told us that Soda Core is different because it’s easy to get going and embed into data pipelines. And it also allows some of the check authoring work to be moved to other members of the data team. I'm sure there are also scenarios where Soda Core is not the best option. For example, when you only use Pandas dataframes or develop in Scala.

  • dagster

    An orchestration platform for the development, production, and observation of data assets.

    Project mention: dbt Cloud Alternatives? | reddit.com/r/dataengineering | 2023-01-23

    Dagster? https://dagster.io

  • metaflow

    :rocket: Build and manage real-life data science projects with ease!

    Project mention: [OC] Gender diversity in Tech companies | reddit.com/r/dataisbeautiful | 2023-01-16

    They had to figure out video compression that worked at the volume that they wanted to deliver. They had to build and maintain their own CDN to be able to have a always available and consistent viewing experience. Don’t even get me started on the resiliency tools like hystrix that they were kind enough to open source. I mean, they have their own fucking data science framework and they’re looking into using neural networks to downscale video.. Sound familiar? That’s cause that’s practically the same thing as Nvidia’s DLSS (which upscales instead of downscales).

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • wandb

    🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

    Project mention: Efficient way to tune a network by changing hyperparameters? | reddit.com/r/deeplearning | 2023-01-25

    Wandb is the best! https://wandb.ai/

  • deeplake

    Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai

    Project mention: Launch HN: Activeloop (YC S18) – Data lake for deep learning | news.ycombinator.com | 2022-11-15

    Re: HF - we know them and admire their work (primarily, until very recently, focused on NLP, while we focus mostly on CV). As mentioned in the post, a large part of Deep Lake, including the Python-based dataloader and dataset format, is open source as well - https://github.com/activeloopai/deeplake.

    Likewise, we curate a list of large open source datasets here -> https://datasets.activeloop.ai/docs/ml/, but our main thing isn't aggregating datasets (focus for HF datasets), but rather providing people with a way to manage their data efficiently. That being said, all of the 125+ public datasets we have are available in seconds with one line of code. :)

    We haven't benchmarked against HF datasets in a while, but Deep Lake's dataloader is much, much faster in third-party benchmarks (see this https://arxiv.org/pdf/2209.13705 and here for an older version, that was much slower than what we have now, see this: https://pasteboard.co/la3DmCUR2iFb.png). HF under the hood uses Git-LFS (to the best of my knowledge) and is not opinionated on formats, so LAION just dumps Parquet files on their storage.

    While your setup would work for a few TBs, scaling to PB would be tricky including maintaining your own infrastructure. And yep, as you said NAS/NFS would neither be able to handle the scale (especially writes with 1k workers). I am also slightly curious about your use of mmap files with image/video compressed data (as zero-copy won’t happen) unless you decompress inside the GPU ;), but would love to learn more from you! Re: pricing thanks for the feedback, storage is one component and customly priced for PB-scale workloads.

  • BentoML

    Unified Model Serving Framework 🍱

    Project mention: Ask HN: Who is hiring? (November 2022) | news.ycombinator.com | 2022-11-01
  • clearml

    ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

    Project mention: Is there any workflow orchestrator that is Hydra friendly ? | reddit.com/r/mlops | 2022-06-16
  • feast

    Feature Store for Machine Learning

    Project mention: [D] Your 🫵 Preferred Feature Stores? | reddit.com/r/datascience | 2022-07-03
  • polyaxon

    MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle

    Project mention: [D] Kubernetes for ML - how are y'all doing it? | reddit.com/r/MachineLearning | 2022-04-14

    We use Polyaxon and it’s pretty good

  • evidently

    Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b

    Project mention: evidently: Evaluate and monitor ML models from validation to production | reddit.com/r/coolgithubprojects | 2022-12-08
  • pipelines

    Machine Learning Pipelines for Kubeflow

  • flyte

    Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.

    Project mention: Github alternative for ML? | reddit.com/r/mlops | 2023-01-26

    Have you looked at flyte.org. It aims to bring "versioning", "compute" and "reproducibility" together in one package.

  • ploomber

    The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

    Project mention: Rant: Jupyter notebooks are trash. | reddit.com/r/datascience | 2023-01-24

    Develop notebook-based pipelines

  • zenml

    ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.

    Project mention: [P] I reviewed 50+ open-source MLOps tools. Here’s the result | reddit.com/r/MachineLearning | 2022-05-29

    Currently, you can see the integrations we support here and it includes a lot of tools in your list. I also feel I agree with your categorization (it is exactly the categorization we use in our docs pretty much). Perhaps one thing missing might be feature stores but that is a minor thing in the bigger picture.

  • FedML

    FedML - The federated learning and analytics library enabling secure and collaborative machine learning on decentralized data anywhere at any scale. Supporting large-scale cross-silo federated learning, cross-device federated learning on smartphones/IoTs, and research simulation. MLOps and App Marketplace are also enabled (https://open.fedml.ai).

    Project mention: FedML AI platform releases the world’s federated learning open platform on the public cloud with an in-depth introduction of products and technologies! | reddit.com/r/u_FedML | 2022-09-28
  • deepchecks

    Tests for Continuous Validation of ML Models & Data. Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.

    Project mention: [D] DL Practitioners, Do You Use Layer Visualization Tools s.a GradCam in Your Process? | reddit.com/r/MachineLearning | 2022-10-28
  • lightning-hydra-template

    PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡

    Project mention: How research scientists structure their code ? | reddit.com/r/pytorch | 2022-07-19

    lightning-hydra-template

  • awesome-mlops

    :sunglasses: A curated list of awesome MLOps tools (by kelvins)

  • determined

    Determined: Deep Learning Training Platform

    Project mention: Queueing/Resource Management Solutions for Self Hosted Workstation? | reddit.com/r/mlops | 2023-01-23

    I looked up and found [Determined Platform](determined.ai), tho it looks a very young project that I don't know if it's reliable enough.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-01-26.

Python Mlops related posts

Index

What are some of the best open-source Mlops projects in Python? This list will help you:

Project Stars
1 jina 17,191
2 nni 12,428
3 label-studio 11,829
4 Kedro 8,040
5 great_expectations 7,922
6 dagster 6,426
7 metaflow 6,352
8 wandb 5,368
9 deeplake 5,189
10 BentoML 4,490
11 clearml 4,028
12 feast 3,928
13 polyaxon 3,239
14 evidently 3,121
15 pipelines 3,091
16 flyte 3,058
17 ploomber 2,944
18 zenml 2,641
19 FedML 2,381
20 deepchecks 2,362
21 lightning-hydra-template 2,190
22 awesome-mlops 2,033
23 determined 2,021
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com