SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Jupyter Notebook Data Analysis Projects
-
Project mention: RisingWave Turns Four: Our Journey Beyond Democratizing Stream Processing | dev.to | 2025-04-18
By making RisingWave compatible with PostgreSQL, we ensured that any developer familiar with SQL could immediately start writing streaming queries. This wasn't just about syntax; it meant RisingWave could plug seamlessly into existing data workflows and connect easily with a vast ecosystem of familiar tools like DBeaver, Grafana, Apache Superset, dbt, and countless others.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
-
machine_learning_complete
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
-
Data-science
Collection of useful data science topics along with articles, videos, and code (by khuyentran1401)
-
-
100-pandas-puzzles
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
mito
Jupyter extensions that help you write code faster: Context aware AI Chat, Autocomplete, and Spreadsheet
3. Tables that translate as Pandas dataframes. We support at most one table per sheet, at the tables must be contigious. If the formulas in a column are consistent, then we will try and translate this as a single pandas statement.
We do not support: pivot tables or complex formulas. When we fail to translate these, we generate TODO statements. We also don’t support graphs or macros - and you won’t see these reflected in the output at all currently.
*Why we built this:*
We did YCS20 and built an open source tool called [Mito](https://trymito.io). It’s been a good journey since then - we’ve scaled revenue and to over [2k Github stars](https://github.com/mito-ds/mito). But fundamentally, Mito is a tool that’s useful for Excel users who wanted to start writing Python code more effectively.
We wanted to take another stab at the Excel -> Python pain point that was more developer focused - that helped developers that have to translate Excel files into Python do this much more quickly. Hence, Pyoneer!
I’ll be in the comments today if you’ve got feedback, criticism, questions, or comments.
-
Linear-Algebra-With-Python
Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.
-
-
hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Project mention: Show HN: I built an open-source data pipeline tool in Go | news.ycombinator.com | 2024-12-17I always thought Hamilton [1] does a good job of giving enough visual hooks that draw you in.
I also noticed this pattern where library authors sometimes do a bit extra in terms of discussing and even promoting their competitors, and it makes me trust them more. A “heres why ours is better and everyone else sucks …” section always comes across as the infomercial character who is having quite a hard time peeling an apple to the point you wonder if this the first time they’ve used hands.
One thing wish for is a tool that’s essentially just Celery that doesn’t require a message broker (and can just use a database), and which is supported on Windows. There’s always a handful of edge cases where we’re pulling data from an old 32-bit system on Windows. And basically every system has some not-quite-ergonomic workaround that’s as much work as if you’d just built it yourself.
It seems like it’s just sending a JSON message over a queue or HTTP API and the worker receives it and runs the task. Maybe it’s way harder than I’m envisioning (but I don’t think so because I’ve already written most of it).
I guess that’s one thing I’m not clear on with Bruin, can I run workers if different physical locations and have them carry out the tasks in the right order? Or is this more of a centralized thing (meaning even if its K8s or Dask or Ray, those are all run in a cluster which happens to be distributed, but they’re all machines sitting in the same subnet, which isn’t the definition of a “distributed task” I’m going for.
[1] https://github.com/DAGWorks-Inc/hamilton
-
https://github.com/pymc-devs/pymc-resources/tree/main/Rethin...
-
-
-
-
-
Econometrics-With-Python
Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.
-
-
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation (by databrickslabs)
-
-
covid19-severity-prediction
Extensive and accessible COVID-19 data + forecasting for counties and hospitals. 📈
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Jupyter Notebook Data Analysis discussion
Jupyter Notebook Data Analysis related posts
-
How AI is Transforming Front-End Development in 2025!
-
Ask HN: Why all these GitHub fake accounts starring my project
-
Welcome to 14 days of Data Science!
-
Data Science for Beginners - A Curriculum
-
Assessing the Quality of Synthetic Data with Data-centric AI
-
Is anyone willing to work with us on a Synthetic Data Project?
-
Where can I find data science projects to gain more experience.
-
A note from our sponsor - SaaSHub
www.saashub.com | 14 May 2025
Index
What are some of the best open-source Data Analysis projects in Jupyter Notebook? This list will help you:
# | Project | Stars |
---|---|---|
1 | superset | 66,182 |
2 | Data-Science-For-Beginners | 29,429 |
3 | pandas_exercises | 11,284 |
4 | machine_learning_complete | 4,789 |
5 | Data-science | 4,095 |
6 | ML-Workspace | 3,490 |
7 | 100-pandas-puzzles | 2,694 |
8 | mito | 2,457 |
9 | Linear-Algebra-With-Python | 2,415 |
10 | hyperlearn | 2,140 |
11 | hamilton | 2,128 |
12 | pymc-resources | 2,005 |
13 | kangas | 1,057 |
14 | qs_ledger | 1,010 |
15 | machine-learning | 694 |
16 | rust-data-analysis | 410 |
17 | Econometrics-With-Python | 406 |
18 | datacamp | 380 |
19 | tempo | 324 |
20 | RasgoQL | 270 |
21 | covid19-severity-prediction | 228 |
22 | PANDAS-TUTORIAL | 216 |
23 | Data-Visualization | 158 |