## Jupyter Notebook Data Analysis

Open-source Jupyter Notebook projects categorized as Data Analysis

# Top 23 Jupyter Notebook Data Analysis Projects

• ### Data-Science-For-Beginners

10 Weeks, 20 Lessons, Data Science for All!

Project mention: Data Science for Beginners - A Curriculum | /r/programming | 2023-09-08

• ### LearnThisRepo.com

Learn 300+ open source libraries for free using AI. LearnThisRepo lets you learn 300+ open source repos including Postgres, Langchain, VS Code, and more by chatting with them using AI!

• ### machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

• ### Data-science

Collection of useful data science topics along with articles, videos, and code (by khuyentran1401)

• ### ML-Workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.

• ### Linear-Algebra-With-Python

Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

Project mention: Python for Econometrics for Practitioners [Free Online Courses] | /r/CompSocial | 2023-08-24

Linear Algebra with Python: This training will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skill sets. Suitable for statisticians, econometricians, quantitative analysts, data scientists, etc. to quickly refresh linear algebra with the assistance of Python computation and visualization. Core concepts covered are: linear combination, vector space, linear transformation, eigenvalues and -vector, diagnolization, singular value decomposition, etc.

• ### 100-pandas-puzzles

100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

• ### WorkOS

The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

• ### pymc-resources

PyMC educational resources

Project mention: Bayesian Analysis with Python | news.ycombinator.com | 2024-02-10

As it happens, there's a PyMC implementation of the 1st and 2nd editions of Statistical Rethinking here:

https://github.com/pymc-devs/pymc-resources

(I think the author of the book discussed above, Osvaldo Martin, is the primary or sole contributor for the Rethinking implementations, in fact -- he had a full implementation in his own repo [here](https://github.com/aloctavodia/Statistical-Rethinking-with-P...) before deprecating it in favor of the above-linked one.)

• ### hyperlearn

2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

Project mention: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning | news.ycombinator.com | 2023-12-01

Good point - the main issue is we encountered this exact issue with our old package Hyperlearn (https://github.com/danielhanchen/hyperlearn).

I OSSed all the code to the community - I'm actually an extremely open person and I love contributing to the OSS community.

The issue was the package got gobbled up by other startups and big tech companies with no credit - I didn't want any cash from it, but it stung and hurt really bad hearing other startups and companies claim it was them who made it faster, whilst it was actually my work. It hurt really bad - as an OSS person, I don't want money, but just some recognition for the work.

I also used to accept and help everyone with their writing their startup's software, but I never got paid or even any thanks - sadly I didn't expect the world to be such a hostile place.

So after a sad awakening, I decided with my brother instead of OSSing everything, we would first OSS something which is still very good - 5X faster training is already very reasonable.

I'm all open to other suggestions on how we should approach this though! There are no evil intentions - in fact I insisted we OSS EVERYTHING even the 30x faster algos, but after a level headed discussion with my brother - we still have to pay life expenses no?

If you have other ways we can go about this - I'm all ears!! We're literally making stuff up as we go along!

• ### hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

Project mention: Show HN: On Garbage Collection and Memory Optimization in Hamilton | news.ycombinator.com | 2023-10-24
• ### kangas

🦘 Explore multimedia datasets at scale

Project mention: Kangas: Pandas for Multimedia Datasets | news.ycombinator.com | 2023-05-03
• ### qs_ledger

Quantified Self Personal Data Aggregator and Data Analysis

• ### machine-learning

Practical Full-Stack Machine Learning

• ### datacamp

🍧 DataCamp data-science and machine learning courses

• ### tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation (by databrickslabs)

• ### RasgoQL

Write python locally, execute SQL in your data warehouse

• ### rust-data-analysis

Rust for data analysis encyclopedia (WIP).

Project mention: Ask HN: Rust Viable for Data Analytics? | news.ycombinator.com | 2024-02-01

Rust still has some key pieces missing, but looks promising, see: https://github.com/wiseaidev/rust-data-analysis

F# has a very decent data community: https://datascienceinfsharp.com

And obviously Julia is also something to consider.

• ### covid19-severity-prediction

Extensive and accessible COVID-19 data + forecasting for counties and hospitals. 📈

• ### Econometrics-With-Python

Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.

Project mention: Python for Econometrics for Practitioners [Free Online Courses] | /r/CompSocial | 2023-08-24

Econometrics with Python: This is a crash course for reviewing the most important concepts and techniques of econometrics. The theories are presented lightly without hustles of mathematical derivation and Python codes are mostly procedural and straightforward. Core concepts covered: multi- linear regression, logistic model, dummy variable, simultaneous equations model, panel data model and time series.

• ### DataScienceWithPython

Learn Data Science with focus on adding value with the most efficient tech stack.

• ### Data-Visualization

Collection of interactive Jupiter Notebook widgets and graphs. (by pierpaolo28)

Project mention: Plotly Dash for Financial Data Analysis | dev.to | 2024-01-23

All the code used as part of this article (and more!) is available on my Github profile.

• ### PANDAS-TUTORIAL

Jupyter Notebooks and Data Sets for Pandas Library (by TirendazAcademy)

• ### daru-view

daru-view is for easy and interactive plotting in web application & IRuby notebook. daru-view is a plugin gem to the existing daru gem.

• ### InfluxDB

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-10.

## Jupyter Notebook Data Analysis related posts

### Index

What are some of the best open-source Data Analysis projects in Jupyter Notebook? This list will help you:

Project Stars
1 Data-Science-For-Beginners 25,683
2 pandas_exercises 9,811
3 machine_learning_complete 4,437
4 Data-science 3,912
5 ML-Workspace 3,288
6 Linear-Algebra-With-Python 2,098
7 100-pandas-puzzles 2,076
8 pymc-resources 1,853
9 hyperlearn 1,510
10 hamilton 1,185
11 kangas 1,020
12 qs_ledger 948
13 machine-learning 651
14 datacamp 295
15 tempo 294
16 RasgoQL 266
17 rust-data-analysis 232
18 covid19-severity-prediction 226
19 Econometrics-With-Python 213
20 DataScienceWithPython 170
21 Data-Visualization 150
22 PANDAS-TUTORIAL 148
23 daru-view 89
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com