Top 23 Python Machine Learning Projects

transformers

175 125,021 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Project mention: Maxtext: A simple, performant and scalable Jax LLM | news.ycombinator.com | 2024-04-23

Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

Pytorch

336 77,783 10.0 Python

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Project mention: My Favorite DevTools to Build AI/ML Applications! | dev.to | 2024-04-23

TensorFlow, developed by Google, and PyTorch, developed by Facebook, are two of the most popular frameworks for building and training complex machine learning models. TensorFlow is known for its flexibility and robust scalability, making it suitable for both research prototypes and production deployments. PyTorch is praised for its ease of use, simplicity, and dynamic computational graph that allows for more intuitive coding of complex AI models. Both frameworks support a wide range of AI models, from simple linear regression to complex deep neural networks.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Keras

77 60,937 9.9 Python

Deep Learning for humans

Project mention: My Favorite DevTools to Build AI/ML Applications! | dev.to | 2024-04-23

As a beginner, I was looking for something simple and flexible for developing deep learning models and that is when I found Keras. Many AI/ML professionals appreciate Keras for its simplicity and efficiency in prototyping and developing deep learning models, making it a preferred choice, especially for beginners and for projects requiring rapid development.

scikit-learn

81 58,046 9.9 Python

scikit-learn: machine learning in Python

Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09

Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).

Face Recognition

34 51,755 0.0 Python

The world's simplest facial recognition api for Python and the command line

Project mention: Security Image Recognition | /r/computervision | 2023-12-10

Camera connected to a PI? Something like this could run locally: https://github.com/ageitgey/face_recognition

faceswap

10 49,178 8.0 Python

Deepfakes Software For All

Project mention: faceswap VS facefusion - a user suggested alternative | libhunt.com/r/faceswap | 2024-01-30

yolov5

129 46,921 8.8 Python

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Project mention: จำแนกสายพันธ์ุหมากับแมวง่ายๆด้วยYoLoV5 | dev.to | 2024-04-15

Ref https://www.youtube.com/watch?v=0GwnxFNfZhM https://github.com/ultralytics/yolov5 https://dev.to/gfstealer666/kaaraich-yolo-alkrithuemainkaartrwcchcchabwatthu-object-detection-3lef https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/data

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Open-Assistant

329 36,622 9.1 Python

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Project mention: Best open source AI chatbot alternative? | /r/opensource | 2023-12-08

For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference

Airflow

169 34,485 10.0 Python

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12

Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.

gym

96 33,873 0.0 Python

A toolkit for developing and comparing reinforcement learning algorithms.

Project mention: OpenAI Acquires Global Illumination | news.ycombinator.com | 2023-08-16

A co-founder announced they disbanded their robots team a couple years ago: https://venturebeat.com/business/openai-disbands-its-robotic...
That was the same time they depreciated OpenAI Gym: https://github.com/openai/gym

DeepSpeed

51 32,550 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

streamlit

255 31,506 9.8 Python

Streamlit — A faster way to build and share data apps.

Project mention: Building an Email Assistant Application with Burr | dev.to | 2024-04-26

Note that there are many tools that make this easier/simpler to prototype, including chainlit, streamlit, etc… The backend API we built is amenable to interacting with them as well.

Ray

42 31,101 10.0 Python

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Project mention: Open Source Advent Fun Wraps Up! | dev.to | 2024-01-05

22. Ray | Github | tutorial

gradio

115 28,730 9.9 Python

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Project mention: Show HN: Dropbase – Build internal web apps with just Python | news.ycombinator.com | 2023-12-05

There's also that library all the AI models started using that gives you a public URL to share. After researching it: https://www.gradio.app/ is the link.
It's used specifically for making simple UIs for machine learning apps. But I guess technically you could use it for anything.

spaCy

106 28,704 9.2 Python

💫 Industrial-strength Natural Language Processing (NLP) in Python

Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23

Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):

pytorch-lightning

8 26,883 9.9 Python

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Project mention: Lightning AI Studios – A persistent GPU cloud environment | news.ycombinator.com | 2023-12-14

data-science-ipython-notebooks

1 26,459 0.0 Python

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
OpenBBTerminal

211 26,055 9.8 Python

Investment Research for Everyone, Everywhere.

Project mention: Open-Sourcing High-Frequency Trading and Market-Making Backtesting Tool | /r/Python | 2023-12-06

You might want to suggest this as an extension to the OpenBB project - I imagine that could be of interest to them if there isn’t something like it built in already :-)

ML-From-Scratch

3 23,164 0.0 Python

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
ultralytics

27 22,624 9.8 Python

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Project mention: The CEO of Ultralytics (yolov8) using LLMs to engage with commenters on GitHub | news.ycombinator.com | 2024-02-12

Yep, I noticed this a while ago. It posts easily identifiable ChatGPT responses. It also posts garbage wrong answers which makes it worse than useless. Totally disrespectful to the userbase.
https://github.com/ultralytics/ultralytics/issues/5748#issue...

NLP-progress

17 22,296 3.2 Python

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
EasyOCR

38 21,882 4.6 Python

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | dev.to | 2023-12-27

PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]

d2l-en

6 21,628 8.7 Python

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Machine Learning related posts

Building an Email Assistant Application with Burr
6 projects | dev.to | 26 Apr 2024
Voxel51 Is Hiring AI Researchers and Scientists — What the New Open Science Positions Mean
1 project | dev.to | 26 Apr 2024
Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations
1 project | dev.to | 26 Apr 2024
Show HN: I made a ROS package for realtime semantic segmentation
1 project | news.ycombinator.com | 26 Apr 2024
How to Estimate Depth from a Single Image
8 projects | dev.to | 25 Apr 2024
Maxtext: A simple, performant and scalable Jax LLM
10 projects | news.ycombinator.com | 23 Apr 2024
My Favorite DevTools to Build AI/ML Applications!
9 projects | dev.to | 23 Apr 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Machine Learning projects in Python? This list will help you:

	Project	Stars
1	transformers	125,021
2	Pytorch	77,783
3	Keras	60,937
4	scikit-learn	58,046
5	Face Recognition	51,755
6	faceswap	49,178
7	yolov5	46,921
8	Open-Assistant	36,622
9	Airflow	34,485
10	gym	33,873
11	DeepSpeed	32,550
12	streamlit	31,506
13	Ray	31,101
14	gradio	28,730
15	spaCy	28,704
16	pytorch-lightning	26,883
17	data-science-ipython-notebooks	26,459
18	OpenBBTerminal	26,055
19	ML-From-Scratch	23,164
20	ultralytics	22,624
21	NLP-progress	22,296
22	EasyOCR	21,882
23	d2l-en	21,628