Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues. Learn more →
Top 23 Python Machine Learning Projects
-
There were actually multiple bugs which impacted long context benchmarks and general inference - I helped fix some of them.
1. RMS norm was 1e-6, but should be 1e-5 - see https://github.com/huggingface/transformers/pull/37418
2. Llama 4 Scout changed RoPE settings after release - conversion script for llama.cpp had to be fixed. See https://github.com/ggml-org/llama.cpp/pull/12889
3. vLLM and the Llama 4 team found QK Norm was normalizing across entire Q & K which was wrong - accuracy increased by 2%. See https://github.com/vllm-project/vllm/pull/16311
If you see https://x.com/WolframRvnwlf/status/1909735579564331016 - the GGUFs I uploaded for Scout actually did better than inference providers by +~5% on MMLU Pro. https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-... has more details
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
Project mention: How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python | dev.to | 2025-04-24
PyTorch
-
>Chollet, a French computer scientist and one of the industry’s sharpest skeptics
I feel like this description really buries the lede on Chollet's expertise. (For those who don't know, he's the creator of and lead contributor[0] to Keras)
[0]https://github.com/keras-team/keras/graphs/contributors
-
7. Scikit-learn - Machine Learning
-
nn
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
-
Syncthing, python face_recognition [1], a static gallery (sigal [2]), and a few lines of bash and its fully automatic. I can even share links with family.
[1] https://github.com/ageitgey/face_recognition
[2] https://github.com/saimn/sigal
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
There are several implementations of the YOLO algorithm available, but for ease-of-use, we will use the Ultralytics implementation in this guide. We will implement and test the code locally and then deploy to Koyeb's GPUs for higher inference speed.
-
Project mention: OpenBB – Investment Research for Everyone, Everywhere | news.ycombinator.com | 2025-03-22
-
Project mention: Show HN: Using YOLO to Detect Office Chairs in 40M Hotel Photos | news.ycombinator.com | 2025-01-25
They did it on their own computer. https://github.com/ultralytics/ultralytics
-
Hi HN,
We've built an SDK for building DAGs / data pipelines with LLMs in Apache Airflow [1] using Pydantic AI [2] under the hood. I've seen success across the board with Airflow users building simple LLM workflows before moving on to "AI agents". In my experience, the noise around building agents means that people forget that there are other ways to get more immediate value out of LLMs.
Coupling Airflow for orchestration and Pydantic AI for LLM interactions has turned out to be a very pragmatic approach to building these workflows (and agents). Neither tool "gets in the way" of what you're trying to do. Airflow's been around for 10+ years and has a very well-built orchestration engine rich with everything you need to write production grade data pipelines, and Pydantic AI's been a refreshing take on working with LLMs.
Would love some feedback from this community!
[1] https://github.com/apache/airflow
-
Streamlit.io: Great documentation and reusable components to integrate with your AI application for rapid python front-end AI development
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Project mention: DeepSpeed-Domino: Communication-Free LLM Training Engine | news.ycombinator.com | 2024-11-26 -
Project mention: How I Used Amazon Nova Reel and Gradio to Auto-Generate Stunning GIF Banners | dev.to | 2025-04-17
To make the tool easy to use, I built a UI with Gradio:
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
I'm guessing this comment is some kind of "if you know, you know." Likely starting from https://docs.ray.io/en/latest/cluster/vms/user-guides/launch... and then trawling through one of these I guess https://github.com/ray-project/ray/issues?q=is%3Aissue+prem+...
-
Project mention: Something weird is happening with LLMs and chess | news.ycombinator.com | 2024-11-14
> OpenAI has never done anything except conversational agents.
Tell me you haven't been following this field without telling me you haven't been following this field[0][1][2]?
[0]: https://github.com/openai/gym
-
Project mention: 15,000 lines of verified cryptography now in Python | news.ycombinator.com | 2025-04-18
Geez honestly
This seems to be the issue https://github.com/explosion/spaCy/issues/13658#issuecomment...
And you depend on opinionated libraries that break with newer versions. Why? Well because f you that's why! Because our library is not just a tool, it's a lifestyle
Though it seems that Pydantic 1x does support 3.13 https://docs.pydantic.dev/1.10/changelog/#v11020-2025-01-07
-
pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Project mention: SB-1047 will stifle open-source AI and decrease safety | news.ycombinator.com | 2024-04-29It's very easy to get started, right in your Terminal, no fees! No credit card at all.
And there are cloud providers like https://replicate.com/ and https://lightning.ai/ that will let you use your LLM via an API key just like you did with OpenAI if you need that.
You don't need OpenAI - nobody does.
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
MindsDB
AI's query engine - Platform for building AI that can learn and answer questions over large scale federated data.
Project mention: Unlocking the Power of Data with MindsDB's Federated Query Engine | dev.to | 2025-04-10Access open source MindsDB’s Federated Query Engine on GitHub here.
-
paperless-ngx
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Project mention: Paperless-ngx: scan, index and archive all your physical documents | news.ycombinator.com | 2024-09-30 -
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
Python Machine Learning discussion
Python Machine Learning related posts
-
How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python
-
Create a Smart Java Chatbot Using Python’s ChatterBot – No APIs Needed
-
How I Used Amazon Nova Reel and Gradio to Auto-Generate Stunning GIF Banners
-
Docker Model Runner
-
A beginner's guide to the Grounding-Dino model by Adirik on Replicate
-
This Bench Does Not Exist
-
Show HN: Open-source, cross platform document data extraction with no OCR
-
A note from our sponsor - Judoscale
judoscale.com | 24 Apr 2025
Index
What are some of the best open-source Machine Learning projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | transformers | 143,133 |
2 | Pytorch | 89,253 |
3 | Keras | 62,884 |
4 | scikit-learn | 61,793 |
5 | nn | 60,225 |
6 | Face Recognition | 54,636 |
7 | faceswap | 53,719 |
8 | yolov5 | 53,449 |
9 | OpenBB | 40,929 |
10 | ultralytics | 39,737 |
11 | Airflow | 39,656 |
12 | streamlit | 38,898 |
13 | DeepSpeed | 38,004 |
14 | gradio | 37,625 |
15 | Open-Assistant | 37,309 |
16 | Ray | 36,619 |
17 | gym | 35,851 |
18 | spaCy | 31,423 |
19 | pytorch-lightning | 29,356 |
20 | data-science-ipython-notebooks | 27,993 |
21 | MindsDB | 27,762 |
22 | paperless-ngx | 26,717 |
23 | supervision | 26,491 |