Jupyter Notebook Machine Learning

Open-source Jupyter Notebook projects categorized as Machine Learning

Top 23 Jupyter Notebook Machine Learning Projects

  • nn

    ๐Ÿง‘โ€๐Ÿซ 60 Implementations/tutorials of deep learning papers with side-by-side notes ๐Ÿ“; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ๐ŸŽฎ reinforcement learning (ppo, dqn), capsnet, distillation, ... ๐Ÿง 

  • TensorFlow-Examples

    TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Made-With-ML

    Learn how to design, develop, deploy and iterate on production-grade ML applications.

  • Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13

    Made With ML

  • google-research

    Google Research

  • Project mention: Show HN: Next-token prediction in JavaScript โ€“ build fast LLMs from scratch | news.ycombinator.com | 2024-04-10

    People on here will be happy to say that I do a similar thing, however my sequence length is dynamic because I also use a 2nd data structure - I'll use pretentious academic speak: I use a simple bigram LM (2-gram) for single next-word likeliness and separately a trie that models all words and phrases (so, n-gram). Not sure how many total nodes because sentence lengths vary in training data, but there are about 200,000 entry points (keys) so probably about 2-10 million total nodes in the default setup.

    "Constructing 7-gram LM": They likely started with bigrams (what I use) which only tells you the next word based on 1 word given, and thought to increase accuracy by modeling out more words in a sequence, and eventually let the user (developer) pass in any amount they want to model (https://github.com/google-research/google-research/blob/5c87...). I thought of this too at first, but I actually got more accuracy (and speed) out of just keeping them as bigrams and making a totally separate structure that models out an n-gram of all phrases (e.g. could be a 24-token long sequence or 100+ tokens etc. I model it all) and if that phrase is found, then I just get the bigram assumption of the last token of the phrase. This works better when the training data is more diverse (for a very generic model), but theirs would probably outperform mine on accuracy when the training data has a lot of nearly identical sentences that only change wildly toward the end - I don't find this pattern in typical data though, maybe for certain coding and other tasks there are those patterns though. But because it's not dynamic and they make you provide that number, even a low number (any phrase longer than 2 words) - theirs will always have to do more lookup work than with simple bigrams and they're also limited by that fixed number as far as accuracy. I wonder how scalable that is - if I need to train on occasional ~100-word long sentences but also (and mostly) just ~3-word long sentences, I guess I set this to 100 and have a mostly "undefined" trie.

    I also thought of the name "LMJS", theirs is "jslm" :) but I went with simply "next-token-prediction" because that's what it ultimately does as a library. I don't know what theirs is really designed for other than proving a concept. Most of their code files are actually comments and hypothetical scenarios.

    I recently added a browser example showing simple autocomplete using my library: https://github.com/bennyschmidt/next-token-prediction/tree/m... (video)

    And next I'm implementing 8-dimensional embeddings that are converted to normalized vectors between 0-1 to see if doing math on them does anything useful beyond similarity, right now they look like this:

      [nextFrequency, prevalence, specificity, length, firstLetter, lastLetter, firstVowel, lastVowel]

  • AI-For-Beginners

    12 Weeks, 24 Lessons, AI for All!

  • Project mention: FREE AI Course By Microsoft: ZERO to HERO! ๐Ÿ”ฅ | dev.to | 2024-03-18

    ๐Ÿ”— https://github.com/microsoft/AI-For-Beginners ๐Ÿ”— https://microsoft.github.io/AI-For-Beginners/

  • llm-course

    Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

  • Project mention: Ask HN: People who switched from GPT to their own models. How was it? | news.ycombinator.com | 2024-02-26

    This is a very nice resource: https://github.com/mlabonne/llm-course

  • fastai

    The fastai deep learning library

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • handson-ml

    โ›”๏ธ DEPRECATED โ€“ See https://github.com/ageron/handson-ml3 instead.

  • homemade-machine-learning

    ๐Ÿค– Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained

  • CLIP

    CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

  • Project mention: How to Cluster Images | dev.to | 2024-04-09

    We will also need two more libraries: OpenAIโ€™s CLIP GitHub repo, enabling us to generate image features with the CLIP model, and the umap-learn library, which will let us apply a dimensionality reduction technique called Uniform Manifold Approximation and Projection (UMAP) to those features to visualize them in 2D:

  • shap

    A game theoretic approach to explain the output of any machine learning model.

  • Project mention: Shap v0.45.0 | news.ycombinator.com | 2024-03-08
  • fastbook

    The fastai book, published as Jupyter Notebooks

  • Project mention: The fastai book, published as Jupyter Notebooks | news.ycombinator.com | 2024-01-17
  • learnopencv

    Learn OpenCV : C++ and Python Examples

  • Project mention: YOLO-NAS Pose | /r/pytorch | 2023-11-16

    Deci's YOLO-NAS Pose: Redefining Pose Estimation! Elevating healthcare, sports, tech, and robotics with precision and speed. Github link and blog link down below! Repo: https://github.com/spmallick/learnopencv/tree/master/YOLO-NAS-Pose

  • python-machine-learning-book

    The "Python Machine Learning (1st edition)" book code repository and info resource

  • machine-learning-for-trading

    Code for Machine Learning for Algorithmic Trading, 2nd edition.

  • Project mention: Machine Learning for Trading: Notebooks, resources and references accompanying the book Machine Learning for Algorithmic Trading. Courses - star count:10678.0 | /r/algoprojects | 2023-11-20
  • FinGPT

    FinGPT: Open-Source Financial Large Language Models! Revolutionize ๐Ÿ”ฅ We release the trained model on HuggingFace.

  • Project mention: GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B | news.ycombinator.com | 2024-03-24

    There is also the open source FinGPT, that is claimed to beat GPT4 in some benchmarks at a fine tuning cost of $17.25.

    https://github.com/AI4Finance-Foundation/FinGPT

  • numerical-linear-algebra

    Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course

  • Project mention: I'm a 42-years-old librarian whithout any math background and I'm willing to learn | /r/learnmachinelearning | 2023-04-27

    If you really like to dig into math, I liked the Udacity course on Intro to Deeplearning with Pytorch. Also, the Stanford course CS231n Convolutional Neural Networks for Visual Recognition is a good place to understand some basics. Other two courses to get you jumpstarted are Practical Deep Learning for Coders and Linear Algebra Course by FastAI

  • amazon-sagemaker-examples

    Example ๐Ÿ““ Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using ๐Ÿง  Amazon SageMaker.

  • Project mention: Thesis Project Help Using SageMaker Free Tier | /r/aws | 2023-09-23

    I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb

  • TensorFlow-Tutorials

    TensorFlow Tutorials with YouTube Videos

  • mlops-zoomcamp

    Free MLOps course from DataTalks.Club

  • Project mention: Where do I start to learn MLOPS? | /r/mlops | 2023-07-01

    There is MLOps Zoomcamp course (which shows end-to-end MLOps process with open-source MLOps tools) https://github.com/DataTalksClub/mlops-zoomcamp.

  • ML-Papers-of-the-Week

    ๐Ÿ”ฅHighlighting the top ML papers every week.

  • Project mention: [D] Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL? | /r/MachineLearning | 2023-07-07

    Labml.ai stopped working in May. I like https://github.com/dair-ai/ML-Papers-of-the-Week

  • pycaret

    An open-source, low-code machine learning library in Python

  • H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

  • Project mention: Really struggling with open source models | /r/LocalLLaMA | 2023-07-12

    I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Machine Learning related posts

Index

What are some of the best open-source Machine Learning projects in Jupyter Notebook? This list will help you:

Project Stars
1 nn 48,004
2 TensorFlow-Examples 43,200
3 Made-With-ML 35,656
4 google-research 32,804
5 AI-For-Beginners 31,046
6 llm-course 28,809
7 fastai 25,610
8 handson-ml 25,097
9 homemade-machine-learning 22,531
10 CLIP 22,051
11 shap 21,632
12 fastbook 20,711
13 learnopencv 20,363
14 python-machine-learning-book 12,076
15 machine-learning-for-trading 11,797
16 FinGPT 11,419
17 numerical-linear-algebra 10,003
18 amazon-sagemaker-examples 9,504
19 TensorFlow-Tutorials 9,250
20 mlops-zoomcamp 8,778
21 ML-Papers-of-the-Week 8,692
22 pycaret 8,406
23 H2O 6,730

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com