Jupyter Notebook Data Science

Open-source Jupyter Notebook projects categorized as Data Science

Top 23 Jupyter Notebook Data Science Projects

  • ML-For-Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    Project mention: I want to learn more about AI and Machine Learning | reddit.com/r/ArtificialInteligence | 2023-01-12
  • Made-With-ML

    Learn how to responsibly develop, deploy and maintain production machine learning applications.

    Project mention: Made with ML: how much time would it take to follow the whole course? | reddit.com/r/learnmachinelearning | 2023-01-31

    Hello fellow learners; the title says it all. Do you have experience completing the course? How much time do you think I should reserve to finish it? Thanks to anyone that will help :)

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

    Project mention: Bayes examples and study help | reddit.com/r/datascience | 2022-10-15

    +1 for Statistical Rethinking. I’m also partial to Bayesian Methods for Hackers.

  • Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

    Project mention: How do I reset my career after already getting my masters? | reddit.com/r/AskUK | 2022-08-20
  • fastbook

    The fastai book, published as Jupyter Notebooks

    Project mention: Fastai Chapter 4 - The important parts, Part 2: Building a regression model | dev.to | 2023-01-25

    The book is available online here The course is accessible here

  • python-machine-learning-book

    The "Python Machine Learning (1st edition)" book code repository and info resource

    Project mention: Can you recommend a Python textbook to replace "An Introduction to Statistical Learning with Applications in R", Witten, J. et. al. [E] | reddit.com/r/statistics | 2022-12-12
  • computervision-recipes

    Best Practices, code samples, and documentation for Computer Vision.

  • SonarLint

    Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.

  • amazon-sagemaker-examples

    Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

    Project mention: Study Plan to pass exam AWS Machine Learning Specialty exam with tips and advice | dev.to | 2022-11-03

    It's time to get your hands dirty by solving some ML Use Cases of your own from AWS SageMaker Use Cases repo.

  • tsfresh

    Automatic extraction of relevant features from time series:

    Project mention: [R] Approach to identify clusters on a time series | reddit.com/r/MachineLearning | 2022-11-26

    Rather than the exact clustering algorithm, I think the main issue here is the feature extraction for the clustering. https://github.com/blue-yonder/tsfresh might be useful for that.

  • pycaret

    An open-source, low-code machine learning library in Python

    Project mention: pycaret: An open-source, low-code machine learning library in Python | reddit.com/r/coolgithubprojects | 2022-09-13
  • machine-learning-for-trading

    Code for Machine Learning for Algorithmic Trading, 2nd edition.

    Project mention: How to become quant from cs | reddit.com/r/csMajors | 2022-10-29
  • H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

    Project mention: Best machine learning framework(s) for production | reddit.com/r/learnmachinelearning | 2022-12-05

    Thanks for the input. To clarify, I am more focused on choosing the modeling framework(s) that makes the most sense to use for future production. For example, is h2o.ai a good framework for training models for later deployment (through something like elastic beanstalk, Flask API's etc.)? I came across a number of mentions of Tensorflow, however it is focused on neural nets while I also want to use classic models such as random forests, etc.

  • probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    Project mention: [P] Any good resources which can help me with Multivariate Time Series Forecasting using Probabilistic Machine Learning? | reddit.com/r/MachineLearning | 2022-08-14
  • nlpaug

    Data augmentation for NLP

    Project mention: Use WordNet to collect homonyms | reddit.com/r/LanguageTechnology | 2022-09-23

    You'd want to use an NLP method for this as in order to determine optimal homonyms there would have to be some method of deriving context from the words ahead of and behind the substitution. Take a look at nlpaug.

  • MachineLearningNotebooks

    Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft

  • Data-science

    Collection of useful data science topics along with articles, videos, and code (by khuyentran1401)

    Project mention: Half of women will get a false positive 3D mammogram, study finds | news.ycombinator.com | 2022-03-25

    And if you can't to positive tests and were each time randomly selected for the test, the probability to be really positive is 99%


  • course-nlp

    A Code-First Introduction to NLP course

    Project mention: Need help finding good NLP course | reddit.com/r/learnmachinelearning | 2022-06-22

    fast.ai also has an NLP course.

  • ML-Workspace

    🛠 All-in-one web-based IDE specialized for machine learning and data science.

    Project mention: [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. | reddit.com/r/MachineLearning | 2023-01-06

    Also check out: https://github.com/ml-tooling/ml-workspace, it a nice open source project with lots of packages ready to use.

  • eli5

    A library for debugging/inspecting machine learning classifiers and explaining their predictions

  • dtreeviz

    A python library for decision tree visualization and model interpretation.


    A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

    Project mention: Show HN: AutoML Python Package for Tabular Data with Automatic Documentation | news.ycombinator.com | 2022-09-05
  • whylogs

    The open standard for data logging

    Project mention: The hand-picked selection of the best Python libraries and tools of 2022 | reddit.com/r/Python | 2022-12-26

    whylogs — model monitoring

  • CodeSearchNet

    Datasets, tools, and benchmarks for representation learning of code.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-01-31.

Jupyter Notebook Data Science related posts


What are some of the best open-source Data Science projects in Jupyter Notebook? This list will help you:

Project Stars
1 ML-For-Beginners 43,860
2 Made-With-ML 32,164
3 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 25,207
4 Data-Science-For-Beginners 17,169
5 fastbook 17,136
6 python-machine-learning-book 11,770
7 computervision-recipes 8,821
8 amazon-sagemaker-examples 7,806
9 tsfresh 7,033
10 pycaret 6,833
11 machine-learning-for-trading 6,650
12 H2O 6,119
13 probability 3,845
14 nlpaug 3,765
15 MachineLearningNotebooks 3,606
16 Data-science 3,420
17 course-nlp 3,241
18 ML-Workspace 2,908
19 eli5 2,634
20 dtreeviz 2,394
21 FLAML 2,229
22 whylogs 2,072
23 CodeSearchNet 1,789
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives