Top 23 Jupyter Notebook Data Science Projects

Project mention: [D] How do you keep up to date on Machine Learning?  /r/learnmachinelearning  20230813
Made With ML

ProbabilisticProgrammingandBayesianMethodsforHackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understandingfirst, mathematicssecond point of view. All in pure Python ;)
Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013)  news.ycombinator.com  20240210 
Get started with Data Science in the Data Science for Beginners curricula.

Project mention: The fastai book, published as Jupyter Notebooks  news.ycombinator.com  20240117

pythonmachinelearningbook
The "Python Machine Learning (1st edition)" book code repository and info resource

Project mention: Machine Learning for Trading: Notebooks, resources and references accompanying the book Machine Learning for Algorithmic Trading. Courses  star count:10678.0  /r/algoprojects  20231120

numericallinearalgebra
Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course
Project mention: I'm a 42yearsold librarian whithout any math background and I'm willing to learn  /r/learnmachinelearning  20230427If you really like to dig into math, I liked the Udacity course on Intro to Deeplearning with Pytorch. Also, the Stanford course CS231n Convolutional Neural Networks for Visual Recognition is a good place to understand some basics. Other two courses to get you jumpstarted are Practical Deep Learning for Coders and Linear Algebra Course by FastAI

amazonsagemakerexamples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazonsagemakerexamples/blob/main/introduction_to_amazon_algorithms/jumpstartfoundationmodels/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb

Project mention: [D] Where can I find a list of the foundational academic papers in RL/ML/DL and what are your goto places to find new academic papers in RL/ML/DL?  /r/MachineLearning  20230707
Labml.ai stopped working in May. I like https://github.com/dairai/MLPapersoftheWeek


Project mention: For deep learning practitioners in industry, is the workflow always this annoying? [D]  /r/MachineLearning  20230710
This is definitely a good thing to try for timeseries; you can automate your feature extraction too (eg using https://github.com/blueyonder/tsfresh ).

H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), KMeans, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/

evidently
Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
Project mention: [P] Free opensource ML observability course: starts October 16 🚀  /r/MachineLearning  20231015Hi everyone, I’m one of the creators of Evidently, an opensource (Apache 2.0) tool for production ML monitoring. We’ve just launched a free open course on ML observability that I wanted to share with the community.

machine_learning_complete
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.


Project mention: How often do you see Bayesian Statistics or Stan in the DS world? Essential skill or a nice to have?  /r/datascience  20230617
TensorFlowProbability

Datascience
Collection of useful data science topics along with articles, videos, and code (by khuyentran1401)

MachineLearningNotebooks
Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK  Microsoft
I found an example using the python SDK v2:

I really like the simplicity of this framework, and they hit on a lot of common problems found in other agentbased frameworks. Most intrigued by the RAG improvements.
Seems like Microsoft was frustrated with the pace of movement in this space and the shitty results of agents (which admittedly kept my interest turned away from agents for the last few months). I'm interested again because it makes practical sense, and from looking at the example notebooks, seems fairly easy to integrate into existing applications.
Maybe this is the 'low code' approach that might actually work, and bridge together engineering and nonengineering resources.
This example was what caught my eye: https://github.com/microsoft/FLAML/blob/main/notebook/autoge...




crackingthedatascienceinterview
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
Project mention: Can someone recommend some website for data science interview preparation  /r/datascience  20230602 
Index
What are some of the best opensource Data Science projects in Jupyter Notebook? This list will help you:
Project  Stars  

1  MadeWithML  35,567 
2  ProbabilisticProgrammingandBayesianMethodsforHackers  26,321 
3  DataScienceForBeginners  26,230 
4  fastbook  20,607 
5  pythonmachinelearningbook  12,076 
6  machinelearningfortrading  11,714 
7  numericallinearalgebra  9,988 
8  amazonsagemakerexamples  9,477 
9  MLPapersoftheWeek  8,609 
10  pycaret  8,364 
11  tsfresh  8,064 
12  H2O  6,705 
13  evidently  4,591 
14  machine_learning_complete  4,476 
15  nlpaug  4,252 
16  probability  4,126 
17  Datascience  3,946 
18  MachineLearningNotebooks  3,939 
19  FLAML  3,663 
20  pythontraining  3,418 
21  coursenlp  3,390 
22  MLWorkspace  3,315 
23  crackingthedatascienceinterview  3,158 