Jupyter Notebook Data Science

Open-source Jupyter Notebook projects categorized as Data Science | Edit details

Top 23 Jupyter Notebook Data Science Projects

  • GitHub repo MadeWithML

    Learn how to responsibly deliver value with ML.

    Project mention: New to mlops, where do I need to start | reddit.com/r/mlops | 2021-11-01

    Standing recommendation for beginners (we should eventually make a wiki) is https://madewithml.com/

  • GitHub repo ML-For-Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    Project mention: Top Github repo trends in 2021 | dev.to | 2022-01-12

    three educational courses- Web Dev, ML, and IoT for beginners. Note re using educational resources as a strategy for marketing , at least the ML course links to various Azure services. Google does this a bunch as well, with Collab notebooks often being used to demo educational materials.

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

    Project mention: How do I study "Bayesian Models" ? | reddit.com/r/learnmachinelearning | 2022-01-18

    I also really like Bayesian Methods for Hackers

  • GitHub repo fastbook

    The fastai book, published as Jupyter Notebooks

    Project mention: Starting a career as a Python developer | reddit.com/r/learnpython | 2021-12-20

    I’m a fan of fast book by fastai.

  • GitHub repo python-machine-learning-book

    The "Python Machine Learning (1st edition)" book code repository and info resource

    Project mention: What is the purpose of meshgrid in Python / NumPy? | reddit.com/r/codehunter | 2022-01-06

    I am studying "Python Machine Learning" from Sebastian Raschka, and he is using it for plotting the decision borders. See input 11 here.

  • GitHub repo pandas-profiling

    Create HTML profiling reports from pandas DataFrame objects

    Project mention: Day 2: Fancy packages to work with Dataframe | dev.to | 2021-12-28

    2 packages I want to mention is pandas-profiling and Mito.

  • GitHub repo Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

    Project mention: Ranqueiem minha experiência empregatícia fake | reddit.com/r/brasil | 2021-12-27
  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • GitHub repo amazon-sagemaker-examples

    Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

    Project mention: AWS - NLP newsletter November 2021 | dev.to | 2021-11-24

    Amazon SageMaker Asynchronous Inference with Hugging Face Model Amazon SageMaker Asynchronous Inference is a new capability in SageMaker that queues incoming requests and processes them asynchronously. SageMaker currently offers two inference options for customers to deploy machine learning models: 1) a real-time option for low-latency workloads 2) Batch transform, an offline option to process inference requests on batches of data available upfront. Real-time inference is suited for workloads with payload sizes of less than 6 MB and require inference requests to be processed within 60 seconds. Batch transform is suitable for offline inference on batches of data. This notebook provides an introduction on how to use the SageMaker Asynchronous inference capability with Hugging Face models. This notebook will cover the steps required to create an Asynchronous inference endpoint and test it with some sample requests.

  • GitHub repo tsfresh

    Automatic extraction of relevant features from time series:

    Project mention: Automatic time series feature extraction based on scalable hypothesis tests | news.ycombinator.com | 2021-02-01
  • GitHub repo H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

    Project mention: [PAID] Looking for Phaser.js game developer | reddit.com/r/INAT | 2021-12-09

    Built and founded various web3 projects for last 2 years such as OpenArt and 8RealmDojo for last 2 years as well as being high performing student in CTU in Prague and SeoulTech. Was offered internships in Amazon and H2O.ai. Created robots assistants using robots from SoftBank.

  • GitHub repo machine-learning-for-trading

    Code for Machine Learning for Algorithmic Trading, 2nd edition.

    Project mention: Machine Learning for Trading: Notebooks, resources and references accompanying the book Machine Learning for Algorithmic Trading. Courses - star count:5136.0 | reddit.com/r/algoprojects | 2022-01-21
  • GitHub repo pycaret

    An open-source, low-code machine learning library in Python

    Project mention: Pycaret | news.ycombinator.com | 2021-12-28
  • GitHub repo probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    Project mention: What is Probabilistic Programming? | reddit.com/r/learnmachinelearning | 2021-09-06

    This tutorial explains what is probabilistic programming & provides a review of 5 frameworks (PPLs) using an example taken from Chapter 4 of Statistical Rethinking by Dr. Richard McElreath. Frameworks (PPLs) reviewed are - Stan (https://mc-stan.org/) PyMC3 (https://docs.pymc.io/) Tensorflow Probability (https://www.tensorflow.org/probability) Pyro/NumPyro (https://pyro.ai/) Turing.jl (https://turing.ml/stable/) I also provide the basic review of a great library called arviz (https://arviz-devs.github.io/arviz/), which can be used for all the above-mentioned PPLs to do Exploratory Data Analysis of Bayesian Models. Here is the link to the notebook in which I have implemented the example model using the above Frameworks/PPLs https://colab.research.google.com/drive/1zgR2b0j2waGi1ppnIe1rw7emkbBXtMqF?usp=sharing

  • GitHub repo course-nlp

    A Code-First Introduction to NLP course

    Project mention: A simple and effective way to go from beginner to intermediate level of ML knowledge | reddit.com/r/datascience | 2021-12-29

    fastai already has a course that covers traditional nlp as well

  • GitHub repo fastpages

    An easy to use blogging platform, with enhanced support for Jupyter Notebooks.

    Project mention: Utterances – a lightweight comments widget built on GitHub issues | news.ycombinator.com | 2021-11-11
  • GitHub repo nlpaug

    Data augmentation for NLP

    Project mention: Text Data Augmentation using GPT-2 Language Model | reddit.com/r/LanguageTechnology | 2021-12-15

    A cool library I recently came across for text augmentation is nlpaug, it does a different thing to your approach, but I think both are useful :)

  • GitHub repo MachineLearningNotebooks

    Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft

    Project mention: I Took The Azure DP-100 exam today and passed it | reddit.com/r/AzureCertification | 2021-04-05
  • GitHub repo eli5

    A library for debugging/inspecting machine learning classifiers and explaining their predictions

    Project mention: How to extract keywords important to a text classification problem? | reddit.com/r/LanguageTechnology | 2021-03-01

    https://github.com/TeamHG-Memex/eli5 can help you.

  • GitHub repo ML-Workspace

    🛠 All-in-one web-based IDE specialized for machine learning and data science.

    Project mention: All-in-One Docker Based IDE for Data Science and ML | news.ycombinator.com | 2021-09-24
  • GitHub repo CodeSearchNet

    Datasets, tools, and benchmarks for representation learning of code.

  • GitHub repo Knet.jl

    Koç University deep learning framework.

    Project mention: Should you learn Julia or Python for Machine Learning? | reddit.com/r/learnmachinelearning | 2021-08-15

    We used to use the popular Flux, Knet, MLBase, and Plots packages for Machine Learning in Julia.

  • GitHub repo resources

    PyMC3 educational resources

    Project mention: Statistical Rethinking (2022 Edition) | news.ycombinator.com | 2022-01-16

    Prof. McElreath has been adding two new videos every week.

    Also, for anyone who prefers to use the pythons for the coding, I recommend the PyMC3 notebooks https://github.com/pymc-devs/resources/tree/master/Rethinkin... There is also a discussion forum related to this repo here https://gitter.im/Statistical-Rethinking-with-Python-and-PyM...

  • GitHub repo python-training

    Python training for business analysts and traders

    Project mention: JP Morgan & Chase free python course (r/finance) | reddit.com/r/algoprojects | 2021-09-14
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-21.

Jupyter Notebook Data Science related posts


What are some of the best open-source Data Science projects in Jupyter Notebook? This list will help you:

Project Stars
1 MadeWithML 29,409
2 ML-For-Beginners 28,735
3 Probabilistic-Programming-and-Bayesian-Methods-for-Hackers 24,031
4 fastbook 14,148
5 python-machine-learning-book 11,460
6 pandas-profiling 8,415
7 Data-Science-For-Beginners 8,027
8 amazon-sagemaker-examples 6,433
9 tsfresh 6,129
10 H2O 5,691
11 machine-learning-for-trading 5,147
12 pycaret 4,943
13 probability 3,571
14 course-nlp 3,042
15 fastpages 2,968
16 nlpaug 2,869
17 MachineLearningNotebooks 2,819
18 eli5 2,491
19 ML-Workspace 2,379
20 CodeSearchNet 1,536
21 Knet.jl 1,325
22 resources 1,309
23 python-training 1,194
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.