Top 23 Jupyter Notebook Machine Learning Projects

nn

26 48,004 7.7 Jupyter Notebook

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
TensorFlow-Examples

2 43,200 0.0 Jupyter Notebook

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Made-With-ML

51 35,656 6.8 Jupyter Notebook

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13

Made With ML

google-research

98 32,804 9.6 Jupyter Notebook

Google Research

Project mention: Show HN: Next-token prediction in JavaScript – build fast LLMs from scratch | news.ycombinator.com | 2024-04-10

People on here will be happy to say that I do a similar thing, however my sequence length is dynamic because I also use a 2nd data structure - I'll use pretentious academic speak: I use a simple bigram LM (2-gram) for single next-word likeliness and separately a trie that models all words and phrases (so, n-gram). Not sure how many total nodes because sentence lengths vary in training data, but there are about 200,000 entry points (keys) so probably about 2-10 million total nodes in the default setup.
"Constructing 7-gram LM": They likely started with bigrams (what I use) which only tells you the next word based on 1 word given, and thought to increase accuracy by modeling out more words in a sequence, and eventually let the user (developer) pass in any amount they want to model (https://github.com/google-research/google-research/blob/5c87...). I thought of this too at first, but I actually got more accuracy (and speed) out of just keeping them as bigrams and making a totally separate structure that models out an n-gram of all phrases (e.g. could be a 24-token long sequence or 100+ tokens etc. I model it all) and if that phrase is found, then I just get the bigram assumption of the last token of the phrase. This works better when the training data is more diverse (for a very generic model), but theirs would probably outperform mine on accuracy when the training data has a lot of nearly identical sentences that only change wildly toward the end - I don't find this pattern in typical data though, maybe for certain coding and other tasks there are those patterns though. But because it's not dynamic and they make you provide that number, even a low number (any phrase longer than 2 words) - theirs will always have to do more lookup work than with simple bigrams and they're also limited by that fixed number as far as accuracy. I wonder how scalable that is - if I need to train on occasional ~100-word long sentences but also (and mostly) just ~3-word long sentences, I guess I set this to 100 and have a mostly "undefined" trie.
I also thought of the name "LMJS", theirs is "jslm" :) but I went with simply "next-token-prediction" because that's what it ultimately does as a library. I don't know what theirs is really designed for other than proving a concept. Most of their code files are actually comments and hypothetical scenarios.
I recently added a browser example showing simple autocomplete using my library: https://github.com/bennyschmidt/next-token-prediction/tree/m... (video)
And next I'm implementing 8-dimensional embeddings that are converted to normalized vectors between 0-1 to see if doing math on them does anything useful beyond similarity, right now they look like this:
  [nextFrequency, prevalence, specificity, length, firstLetter, lastLetter, firstVowel, lastVowel]

AI-For-Beginners

8 31,046 6.7 Jupyter Notebook

12 Weeks, 24 Lessons, AI for All!

Project mention: FREE AI Course By Microsoft: ZERO to HERO! 🔥 | dev.to | 2024-03-18

🔗 https://github.com/microsoft/AI-For-Beginners 🔗 https://microsoft.github.io/AI-For-Beginners/

llm-course

6 28,809 8.1 Jupyter Notebook

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Project mention: Ask HN: People who switched from GPT to their own models. How was it? | news.ycombinator.com | 2024-02-26

This is a very nice resource: https://github.com/mlabonne/llm-course

fastai

9 25,610 8.0 Jupyter Notebook

The fastai deep learning library
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
handson-ml

1 25,097 0.0 Jupyter Notebook

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
homemade-machine-learning

7 22,531 3.8 Jupyter Notebook

🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
CLIP

103 22,051 1.2 Jupyter Notebook

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Project mention: How to Cluster Images | dev.to | 2024-04-09

We will also need two more libraries: OpenAI’s CLIP GitHub repo, enabling us to generate image features with the CLIP model, and the umap-learn library, which will let us apply a dimensionality reduction technique called Uniform Manifold Approximation and Projection (UMAP) to those features to visualize them in 2D:

shap

38 21,632 9.3 Jupyter Notebook

A game theoretic approach to explain the output of any machine learning model.

Project mention: Shap v0.45.0 | news.ycombinator.com | 2024-03-08

fastbook

23 20,711 2.6 Jupyter Notebook

The fastai book, published as Jupyter Notebooks

Project mention: The fastai book, published as Jupyter Notebooks | news.ycombinator.com | 2024-01-17

learnopencv

6 20,363 8.6 Jupyter Notebook

Learn OpenCV : C++ and Python Examples

Project mention: YOLO-NAS Pose | /r/pytorch | 2023-11-16

Deci's YOLO-NAS Pose: Redefining Pose Estimation! Elevating healthcare, sports, tech, and robotics with precision and speed. Github link and blog link down below! Repo: https://github.com/spmallick/learnopencv/tree/master/YOLO-NAS-Pose

python-machine-learning-book

2 12,076 0.0 Jupyter Notebook

The "Python Machine Learning (1st edition)" book code repository and info resource
machine-learning-for-trading

224 11,797 1.1 Jupyter Notebook

Code for Machine Learning for Algorithmic Trading, 2nd edition.

Project mention: Machine Learning for Trading: Notebooks, resources and references accompanying the book Machine Learning for Algorithmic Trading. Courses - star count:10678.0 | /r/algoprojects | 2023-11-20

FinGPT

11 11,419 9.6 Jupyter Notebook

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

Project mention: GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B | news.ycombinator.com | 2024-03-24

There is also the open source FinGPT, that is claimed to beat GPT4 in some benchmarks at a fine tuning cost of $17.25.
https://github.com/AI4Finance-Foundation/FinGPT

numerical-linear-algebra

6 10,003 0.0 Jupyter Notebook

Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course

Project mention: I'm a 42-years-old librarian whithout any math background and I'm willing to learn | /r/learnmachinelearning | 2023-04-27

If you really like to dig into math, I liked the Udacity course on Intro to Deeplearning with Pytorch. Also, the Stanford course CS231n Convolutional Neural Networks for Visual Recognition is a good place to understand some basics. Other two courses to get you jumpstarted are Practical Deep Learning for Coders and Linear Algebra Course by FastAI

amazon-sagemaker-examples

17 9,504 9.1 Jupyter Notebook

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Project mention: Thesis Project Help Using SageMaker Free Tier | /r/aws | 2023-09-23

I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb

TensorFlow-Tutorials

2 9,250 0.0 Jupyter Notebook

TensorFlow Tutorials with YouTube Videos
mlops-zoomcamp

23 8,778 7.5 Jupyter Notebook

Free MLOps course from DataTalks.Club

Project mention: Where do I start to learn MLOPS? | /r/mlops | 2023-07-01

There is MLOps Zoomcamp course (which shows end-to-end MLOps process with open-source MLOps tools) https://github.com/DataTalksClub/mlops-zoomcamp.

ML-Papers-of-the-Week

2 8,692 8.5 Jupyter Notebook

🔥Highlighting the top ML papers every week.

Project mention: [D] Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL? | /r/MachineLearning | 2023-07-07

Labml.ai stopped working in May. I like https://github.com/dair-ai/ML-Papers-of-the-Week

pycaret

5 8,406 9.4 Jupyter Notebook

An open-source, low-code machine learning library in Python
H2O

10 6,730 9.7 Jupyter Notebook

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Project mention: Really struggling with open source models | /r/LocalLLaMA | 2023-07-12

I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Machine Learning related posts

Why Vector Compression Matters
3 projects | dev.to | 24 Apr 2024
The Illustrated Word2Vec
3 projects | news.ycombinator.com | 19 Apr 2024
Machine Learning and AI Beyond the Basics Book
1 project | news.ycombinator.com | 16 Apr 2024
Google Research website is down
1 project | news.ycombinator.com | 5 Apr 2024
When Will the GenAI Bubble Burst?
1 project | news.ycombinator.com | 4 Apr 2024
FREE AI Course By Microsoft: ZERO to HERO! 🔥
1 project | dev.to | 18 Mar 2024
Shap v0.45.0
1 project | news.ycombinator.com | 8 Mar 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Machine Learning projects in Jupyter Notebook? This list will help you:

	Project	Stars
1	nn	48,004
2	TensorFlow-Examples	43,200
3	Made-With-ML	35,656
4	google-research	32,804
5	AI-For-Beginners	31,046
6	llm-course	28,809
7	fastai	25,610
8	handson-ml	25,097
9	homemade-machine-learning	22,531
10	CLIP	22,051
11	shap	21,632
12	fastbook	20,711
13	learnopencv	20,363
14	python-machine-learning-book	12,076
15	machine-learning-for-trading	11,797
16	FinGPT	11,419
17	numerical-linear-algebra	10,003
18	amazon-sagemaker-examples	9,504
19	TensorFlow-Tutorials	9,250
20	mlops-zoomcamp	8,778
21	ML-Papers-of-the-Week	8,692
22	pycaret	8,406
23	H2O	6,730