Python ML

Open-source Python projects categorized as ML

Top 23 Python ML Projects

  • yolov5

    YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

    Project mention: Real-Time Object Detection with YOLO: A Step-by-Step Guide with Realtime Fire Detection Example. | dev.to | 2023-01-03

    In this blog, In this tutorial we'll explore the working of the YOLO model and how it can be used for real-time fire detection using implimentation from Ultralytics [https://github.com/ultralytics/yolov5]. We will use transfer-learning techniques from P5 models (P5 models are model supported by ultralytics and differs in architecture and parameter size) to train our own model, evaluate its performances and use it for inference.

  • MLflow

    Open source platform for the machine learning lifecycle

    Project mention: ML experiment tracking with DagsHub, MLFlow, and DVC | dev.to | 2023-01-12

    Here, we’ll implement the experimentation workflow using DagsHub, Google Colab, MLflow, and data version control (DVC). We’ll focus on how to do this without diving deep into the technicalities of building or designing a workbench from scratch. Going that route might increase the complexity involved, especially if you are in the early stages of understanding ML workflows, just working on a small project, or trying to implement a proof of concept.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Ask HN: How to get back into AI? | news.ycombinator.com | 2022-12-10

    For Python, here's a nice compilation: https://github.com/ml-tooling/best-of-ml-python/blob/main/RE...

  • MindsDB

    In-Database Machine Learning

    Project mention: Making Something Waspy: A Review Of Wasp | dev.to | 2023-01-10

    I picked a few of them on the hacktoberswag.com website. Precisely three: pusher.js, refine.dev, and MindsDB. You are probably asking Oh, you did not pick Wasp? The thing was, Wasp wasn’t listed on that web page and I didn’t know if any tool by that name existed.

  • yolov3

    YOLOv3 in PyTorch > ONNX > CoreML > TFLite

    Project mention: [Tutorial] "Fine Tuning" Stable Diffusion using only 5 Images Using Textual Inversion. | reddit.com/r/StableDiffusion | 2022-08-23

    Hey. I only have experience using the official repository, and only use Linux. Could you try the solutions here and see if it helps? https://github.com/ultralytics/yolov3/issues/1643

  • ludwig

    Data-centric declarative deep learning framework

  • metaflow

    :rocket: Build and manage real-life data science projects with ease!

    Project mention: [OC] Gender diversity in Tech companies | reddit.com/r/dataisbeautiful | 2023-01-16

    They had to figure out video compression that worked at the volume that they wanted to deliver. They had to build and maintain their own CDN to be able to have a always available and consistent viewing experience. Don’t even get me started on the resiliency tools like hystrix that they were kind enough to open source. I mean, they have their own fucking data science framework and they’re looking into using neural networks to downscale video.. Sound familiar? That’s cause that’s practically the same thing as Nvidia’s DLSS (which upscales instead of downscales).

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • CoreML-Models

    Largest list of models for Core ML (for iOS 11+)

    Project mention: I made an app completely on SwiftUI dedicated to browsing vehicles for sale on eBay. It got rejected for being too basic, should I justify any more time on this? | reddit.com/r/swift | 2023-01-24

    Super far fetched idea on passing 4.2 with iOS specific functionality: there are CoreML models specifically capable of identifying car makes and models (linked in this repo), which could allow you to take/select a photo, and automatically identify/search the car based on the prediction. That being said, it will not take into account nearly as many details as you can manually specify in your app. Nice work!

  • deeplake

    Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai

    Project mention: Launch HN: Activeloop (YC S18) – Data lake for deep learning | news.ycombinator.com | 2022-11-15

    Re: HF - we know them and admire their work (primarily, until very recently, focused on NLP, while we focus mostly on CV). As mentioned in the post, a large part of Deep Lake, including the Python-based dataloader and dataset format, is open source as well - https://github.com/activeloopai/deeplake.

    Likewise, we curate a list of large open source datasets here -> https://datasets.activeloop.ai/docs/ml/, but our main thing isn't aggregating datasets (focus for HF datasets), but rather providing people with a way to manage their data efficiently. That being said, all of the 125+ public datasets we have are available in seconds with one line of code. :)

    We haven't benchmarked against HF datasets in a while, but Deep Lake's dataloader is much, much faster in third-party benchmarks (see this https://arxiv.org/pdf/2209.13705 and here for an older version, that was much slower than what we have now, see this: https://pasteboard.co/la3DmCUR2iFb.png). HF under the hood uses Git-LFS (to the best of my knowledge) and is not opinionated on formats, so LAION just dumps Parquet files on their storage.

    While your setup would work for a few TBs, scaling to PB would be tricky including maintaining your own infrastructure. And yep, as you said NAS/NFS would neither be able to handle the scale (especially writes with 1k workers). I am also slightly curious about your use of mmap files with image/video compressed data (as zero-copy won’t happen) unless you decompress inside the GPU ;), but would love to learn more from you! Re: pricing thanks for the feedback, storage is one component and customly priced for PB-scale workloads.

  • BentoML

    Unified Model Serving Framework 🍱

    Project mention: Ask HN: Who is hiring? (November 2022) | news.ycombinator.com | 2022-11-01
  • feast

    Feature Store for Machine Learning

    Project mention: [D] Your 🫵 Preferred Feature Stores? | reddit.com/r/datascience | 2022-07-03
  • hub

    A library for transfer learning by reusing parts of TensorFlow models. (by tensorflow)

    Project mention: Tensorflow Custom TFLite java.lang.NullPointerException: Cannot allocate memory for the interpreter | reddit.com/r/codehunter | 2022-05-14

    I have created a custom tensorflow lite model using retrain.py from https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py using the following command

  • polyaxon

    MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle

    Project mention: [D] Kubernetes for ML - how are y'all doing it? | reddit.com/r/MachineLearning | 2022-04-14

    We use Polyaxon and it’s pretty good

  • zenml

    ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.

    Project mention: [P] I reviewed 50+ open-source MLOps tools. Here’s the result | reddit.com/r/MachineLearning | 2022-05-29

    Currently, you can see the integrations we support here and it includes a lot of tools in your list. I also feel I agree with your categorization (it is exactly the categorization we use in our docs pretty much). Perhaps one thing missing might be feature stores but that is a minor thing in the bigger picture.

  • deepchecks

    Tests for Continuous Validation of ML Models & Data. Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.

    Project mention: [D] DL Practitioners, Do You Use Layer Visualization Tools s.a GradCam in Your Process? | reddit.com/r/MachineLearning | 2022-10-28
  • zvt

    modular quant framework.

    Project mention: Algo trading in matlab / C++ | reddit.com/r/algotrading | 2022-06-09
  • awesome-mlops

    :sunglasses: A curated list of awesome MLOps tools (by kelvins)

  • ScaledYOLOv4

    Scaled-YOLOv4: Scaling Cross Stage Partial Network

    Project mention: DeepSort with PyTorch(support yolo series) | reddit.com/r/u_No_Experience9104 | 2022-09-20

    WongKinYiu/ScaledYOLOv4

  • GPflow

    Gaussian processes in TensorFlow

    Project mention: [D] Open Source ML Organisations to contribute to? | reddit.com/r/MachineLearning | 2022-09-09
  • Photonix

    A modern, web-based photo management server. Run it on your home server and it will let you find the right photo from your collection on any device. Smart filtering is made possible by object recognition, face recognition, location awareness, color analysis and other ML algorithms.

    Project mention: Open Source React or Vue projects | reddit.com/r/opensource | 2022-03-12

    Google Photos-like Photonix (https://github.com/photonixapp/photonix, https://photonix.org/) has a React frontend and (maybe) mobile clients too?

  • nannyml

    Detecting silent model failure. NannyML estimates performance for regression and classification models using tabular data. It alerts you when and why it changed. It is the only open-source library capable of fully capturing the impact of data drift on performance.

    Project mention: [HIRING][Full Time, Part Time, Temporary, Internship, Freelance] Data Science Intern (Remote) | reddit.com/r/jobbit | 2022-05-20

    Description NannyML - creators of an Open Source Python library, are looking for multiple Data Science interns to help across research, prototyping, and product. Github: https://github.com/NannyML/nannyml About Us NannyML is an Open Source Python lib …

  • model-optimization

    A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

  • pycm

    Multi-class confusion matrix library in Python

    Project mention: PyCM 3.8 Released: Distance/Similarity Support | news.ycombinator.com | 2023-02-02
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-02-02.

Python ML related posts

Index

What are some of the best open-source ML projects in Python? This list will help you:

Project Stars
1 yolov5 34,913
2 MLflow 13,574
3 best-of-ml-python 12,580
4 MindsDB 12,526
5 yolov3 9,286
6 ludwig 8,728
7 metaflow 6,352
8 CoreML-Models 5,601
9 deeplake 5,197
10 BentoML 4,490
11 feast 3,928
12 hub 3,265
13 polyaxon 3,239
14 zenml 2,641
15 deepchecks 2,362
16 zvt 2,321
17 awesome-mlops 2,033
18 ScaledYOLOv4 1,986
19 GPflow 1,699
20 Photonix 1,520
21 nannyml 1,373
22 model-optimization 1,362
23 pycm 1,347
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com