Top 23 ML Open-Source Projects

tensorflow

223 182,693 10.0 C++

An Open Source Machine Learning Framework for Everyone

Project mention: Side Quest Devblog #1: These Fakes are getting Deep | dev.to | 2024-04-29

# L2-normalize the encoding tensors image_encoding = tf.math.l2_normalize(image_encoding, axis=1) audio_encoding = tf.math.l2_normalize(audio_encoding, axis=1) # Find euclidean distance between image_encoding and audio_encoding # Essentially trying to detect if the face is saying the audio # Will return nan without the 1e-12 offset due to https://github.com/tensorflow/tensorflow/issues/12071 d = tf.norm((image_encoding - audio_encoding) + 1e-12, ord='euclidean', axis=1, keepdims=True) discriminator = keras.Model(inputs=[image_input, audio_input], outputs=[d], name="discriminator")

ML-For-Beginners

28 67,111 7.6 HTML

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13

- https://github.com/microsoft/ML-For-Beginners
Also check out this list Pitt puts out every year:

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
yolov5

129 47,202 8.8 Python

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Project mention: จำแนกสายพันธ์ุหมากับแมวง่ายๆด้วยYoLoV5 | dev.to | 2024-04-15

Ref https://www.youtube.com/watch?v=0GwnxFNfZhM https://github.com/ultralytics/yolov5 https://dev.to/gfstealer666/kaaraich-yolo-alkrithuemainkaartrwcchcchabwatthu-object-detection-3lef https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/data

netron

34 26,174 9.9 JavaScript

Visualizer for neural network, deep learning and machine learning models

Project mention: Your 14-Day Free Trial Ain't Gonna Cut It | news.ycombinator.com | 2024-05-06

They're data-dependence graphs for a neural-network scheduling problem. Like this but way bigger to start with and then lowered to more detailed representations several times: https://netron.app/?url=https://github.com/onnx/models/raw/m... My home-grown layout engine can handle the 12k nodes for llama2 in its highest-level form in 20s or so, but its not the most featureful, and they only get bigger from there. So I always have an eye out for potential tools.

handson-ml

1 25,094 0.0 Jupyter Notebook

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
MindsDB

78 21,354 10.0 Python

The platform for customizing AI from enterprise data

Project mention: What’s the Difference Between Fine-tuning, Retraining, and RAG? | dev.to | 2024-04-08

Check us out on GitHub.

MLflow

56 17,335 9.9 Python

Open source platform for the machine learning lifecycle

Project mention: Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations | dev.to | 2024-04-26

How can this be? The current state of practice in AI/ML work requires adaptivity, which is uncommon in classical computational fields. There are myriad tools that capture the work across the many instances of the AI/ML lifecycle. The idea that any one tool could sufficiently capture the dynamic work is unrealistic. Take, for example, an experiment tracking tool like W&B or MLFlow; some form of experiment tracking is necessary in typical model training lifecycles. Such a tool requires some notion of a dataset. However, a tool focusing on experiment tracking is orthogonal to the needs of analyzing model performance at the data sample level, which is critical to understanding the failure modes of models. The way one does this depends on the type of data and the AI/ML task at hand. In other words, MLOps is inherently an intricate mosaic, as the capabilities and best practices of AI/ML work evolve.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
StableLM

43 15,851 5.0 Jupyter Notebook

StableLM: Stability AI Language Models

Project mention: The Era of 1-bit LLMs: ternary parameters for cost-effective computing | news.ycombinator.com | 2024-02-28

https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...

best-of-ml-python

16 15,633 7.8 Python

🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
kubeflow

3 13,700 8.3 TypeScript

Machine Learning Toolkit for Kubernetes
awesome-mlops

24 11,769 5.2

A curated list of references for MLOps
ludwig

3 10,845 9.5 Python

Low-code framework for building custom LLMs, neural networks, and other AI models

Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.
questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?
Would love to see more progress toward this area!

dopamine

3 10,378 5.7 Jupyter Notebook

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
ML.NET

17 8,855 8.9 C#

ML.NET is an open source and cross-platform machine learning framework for .NET.
pycaret

5 8,450 9.4 Jupyter Notebook

An open-source, low-code machine learning library in Python
MNN

3 8,325 8.0 C++

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Project mention: [D][R] Deploying deep models on memory constrained devices | /r/MachineLearning | 2023-10-03

However, I am looking on this subject through the problem of training/finetuning deep models on the edge devices, being increasingly available thing to do. Looking at tflite, alibaba's MNN, mit-han-lab's tinyengine etc..

deeplake

13 7,729 9.8 Python

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Project mention: FLaNK AI Weekly 25 March 2025 | dev.to | 2024-03-25

metaflow

24 7,630 9.2 Python

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

unstructured

12 6,515 9.8 HTML

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Project mention: LlamaCloud and LlamaParse | news.ycombinator.com | 2024-02-20

Be careful with unstructured:
https://github.com/Unstructured-IO/unstructured/blob/d11c70c...
from: https://github.com/open-webui/open-webui/issues/687

CoreML-Models

2 6,241 2.3 Python

Largest list of models for Core ML (for iOS 11+)
serving

12 6,085 9.8 C++

A flexible, high-performance serving system for machine learning models

Project mention: Llama.cpp: Full CUDA GPU Acceleration | news.ycombinator.com | 2023-06-12

Yet another TEDIOUS BATTLE: Python vs. C++/C stack.
This project gained popularity due to the HIGH DEMAND for running large models with 1B+ parameters, like `llama`. Python dominates the interface and training ecosystem, but prior to llama.cpp, non-ML professionals showed little interest in a fast C++ interface library. While existing solutions like tensorflow-serving [1] in C++ were sufficiently fast with GPU support, llama.cpp took the initiative to optimize for CPU and trim unnecessary code, essentially code-golfing and sacrificing some algorithm correctness for improved performance, which isn't favored by "ML research".
NOTE: In my opinion, a true pioneer was DarkNet, which implemented the YOLO model series and significantly outperformed others [2]. Same trick basically like llama.cpp
[1] https://github.com/tensorflow/serving

llm

41 5,931 9.4 Rust

An ecosystem of Rust libraries for working with large language models

Project mention: Open-sourcing a simple automation/agent workflow builder | /r/ChatGPTPro | 2023-10-07

We're open-sourcing a project that lets you build simple automations/agent workflows that use LLMs for different tasks. Kinda like Zapier or IFTTT but focused on using natural language to accomplish your tasks.It's super early but we'd love to start getting feedback to steer it in the right direction. It currently supports OpenAI and local models through llm.

oneflow

32 5,731 8.4 C++

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

ML related posts

Show HN: LLM-powered NPCs running on your hardware

4 projects | news.ycombinator.com | 30 Apr 2024
Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations

1 project | dev.to | 26 Apr 2024
Machine Learning with PHP

3 projects | dev.to | 22 Apr 2024
Show HN: Open-source Google Docs for audio transcriptions (Whisper)

2 projects | news.ycombinator.com | 17 Apr 2024
What’s the Difference Between Fine-tuning, Retraining, and RAG?

1 project | dev.to | 8 Apr 2024
W3C discussions of impact of ML models on the web

2 projects | news.ycombinator.com | 4 Apr 2024
Why do tree-based models still outperform deep learning on tabular data? (2022)

3 projects | news.ycombinator.com | 5 Mar 2024
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source ML projects? This list will help you:

	Project	Stars
1	tensorflow	182,693
2	ML-For-Beginners	67,111
3	yolov5	47,202
4	netron	26,174
5	handson-ml	25,094
6	MindsDB	21,354
7	MLflow	17,335
8	StableLM	15,851
9	best-of-ml-python	15,633
10	kubeflow	13,700
11	awesome-mlops	11,769
12	ludwig	10,845
13	dopamine	10,378
14	ML.NET	8,855
15	pycaret	8,450
16	MNN	8,325
17	deeplake	7,729
18	metaflow	7,630
19	unstructured	6,515
20	CoreML-Models	6,241
21	serving	6,085
22	llm	5,931
23	oneflow	5,731

ML

Top 23 ML Open-Source Projects

ML related posts

Show HN: LLM-powered NPCs running on your hardware

Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations

Machine Learning with PHP

Show HN: Open-source Google Docs for audio transcriptions (Whisper)

What’s the Difference Between Fine-tuning, Retraining, and RAG?

W3C discussions of impact of ML models on the web

Why do tree-based models still outperform deep learning on tabular data? (2022)

Index