Jupyter Notebook Data Analysis

Open-source Jupyter Notebook projects categorized as Data Analysis

Top 23 Jupyter Notebook Data Analysis Projects

Data Analysis
  1. superset

    Apache Superset is a Data Visualization and Data Exploration Platform

    Project mention: RisingWave Turns Four: Our Journey Beyond Democratizing Stream Processing | dev.to | 2025-04-18

    By making RisingWave compatible with PostgreSQL, we ensured that any developer familiar with SQL could immediately start writing streaming queries. This wasn't just about syntax; it meant RisingWave could plug seamlessly into existing data workflows and connect easily with a vast ecosystem of familiar tools like DBeaver, Grafana, Apache Superset, dbt, and countless others.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

  4. pandas_exercises

    Practice your pandas skills!

  5. machine_learning_complete

    A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

  6. Data-science

    Collection of useful data science topics along with articles, videos, and code (by khuyentran1401)

  7. ML-Workspace

    🛠 All-in-one web-based IDE specialized for machine learning and data science.

  8. 100-pandas-puzzles

    100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. mito

    Jupyter extensions that help you write code faster: Context aware AI Chat, Autocomplete, and Spreadsheet

    Project mention: Show HN: Excel to Python Compiler | news.ycombinator.com | 2024-05-23

    3. Tables that translate as Pandas dataframes. We support at most one table per sheet, at the tables must be contigious. If the formulas in a column are consistent, then we will try and translate this as a single pandas statement.

    We do not support: pivot tables or complex formulas. When we fail to translate these, we generate TODO statements. We also don’t support graphs or macros - and you won’t see these reflected in the output at all currently.

    *Why we built this:*

    We did YCS20 and built an open source tool called [Mito](https://trymito.io). It’s been a good journey since then - we’ve scaled revenue and to over [2k Github stars](https://github.com/mito-ds/mito). But fundamentally, Mito is a tool that’s useful for Excel users who wanted to start writing Python code more effectively.

    We wanted to take another stab at the Excel -> Python pain point that was more developer focused - that helped developers that have to translate Excel files into Python do this much more quickly. Hence, Pyoneer!

    I’ll be in the comments today if you’ve got feedback, criticism, questions, or comments.

  11. Linear-Algebra-With-Python

    Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

  12. hyperlearn

    2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

  13. hamilton

    Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

    Project mention: Show HN: I built an open-source data pipeline tool in Go | news.ycombinator.com | 2024-12-17

    I always thought Hamilton [1] does a good job of giving enough visual hooks that draw you in.

    I also noticed this pattern where library authors sometimes do a bit extra in terms of discussing and even promoting their competitors, and it makes me trust them more. A “heres why ours is better and everyone else sucks …” section always comes across as the infomercial character who is having quite a hard time peeling an apple to the point you wonder if this the first time they’ve used hands.

    One thing wish for is a tool that’s essentially just Celery that doesn’t require a message broker (and can just use a database), and which is supported on Windows. There’s always a handful of edge cases where we’re pulling data from an old 32-bit system on Windows. And basically every system has some not-quite-ergonomic workaround that’s as much work as if you’d just built it yourself.

    It seems like it’s just sending a JSON message over a queue or HTTP API and the worker receives it and runs the task. Maybe it’s way harder than I’m envisioning (but I don’t think so because I’ve already written most of it).

    I guess that’s one thing I’m not clear on with Bruin, can I run workers if different physical locations and have them carry out the tasks in the right order? Or is this more of a centralized thing (meaning even if its K8s or Dask or Ray, those are all run in a cluster which happens to be distributed, but they’re all machines sitting in the same subnet, which isn’t the definition of a “distributed task” I’m going for.

    [1] https://github.com/DAGWorks-Inc/hamilton

  14. pymc-resources

    PyMC educational resources

    Project mention: Statistical Rethinking (2024 Edition) | news.ycombinator.com | 2024-11-16

    https://github.com/pymc-devs/pymc-resources/tree/main/Rethin...

  15. kangas

    🦘 Explore multimedia datasets at scale

  16. qs_ledger

    Quantified Self Personal Data Aggregator and Data Analysis

  17. machine-learning

    Practical Full-Stack Machine Learning

  18. rust-data-analysis

    Rust for data analysis encyclopedia (WIP).

  19. Econometrics-With-Python

    Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.

  20. datacamp

    🍧 DataCamp data-science and machine learning courses

  21. tempo

    API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation (by databrickslabs)

  22. RasgoQL

    Write python locally, execute SQL in your data warehouse

  23. covid19-severity-prediction

    Extensive and accessible COVID-19 data + forecasting for counties and hospitals. 📈

  24. PANDAS-TUTORIAL

    Jupyter Notebooks and Data Sets for Pandas Library (by TirendazAcademy)

  25. Data-Visualization

    Collection of interactive Jupiter Notebook widgets and graphs. (by pierpaolo28)

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Data Analysis discussion

Log in or Post with

Jupyter Notebook Data Analysis related posts

  • How AI is Transforming Front-End Development in 2025!

    4 projects | dev.to | 23 Apr 2025
  • Ask HN: Why all these GitHub fake accounts starring my project

    1 project | news.ycombinator.com | 9 May 2024
  • Welcome to 14 days of Data Science!

    1 project | dev.to | 7 Mar 2024
  • Data Science for Beginners - A Curriculum

    1 project | /r/programming | 8 Sep 2023
  • Assessing the Quality of Synthetic Data with Data-centric AI

    1 project | /r/ArtificialInteligence | 13 Jul 2023
  • Is anyone willing to work with us on a Synthetic Data Project?

    1 project | /r/ArtificialInteligence | 27 Jun 2023
  • Where can I find data science projects to gain more experience.

    2 projects | /r/datascience | 1 Jun 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 14 May 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Data Analysis projects in Jupyter Notebook? This list will help you:

# Project Stars
1 superset 66,182
2 Data-Science-For-Beginners 29,429
3 pandas_exercises 11,284
4 machine_learning_complete 4,789
5 Data-science 4,095
6 ML-Workspace 3,490
7 100-pandas-puzzles 2,694
8 mito 2,457
9 Linear-Algebra-With-Python 2,415
10 hyperlearn 2,140
11 hamilton 2,128
12 pymc-resources 2,005
13 kangas 1,057
14 qs_ledger 1,010
15 machine-learning 694
16 rust-data-analysis 410
17 Econometrics-With-Python 406
18 datacamp 380
19 tempo 324
20 RasgoQL 270
21 covid19-severity-prediction 228
22 PANDAS-TUTORIAL 216
23 Data-Visualization 158

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Jupyter Notebook is
the 13th most popular programming language
based on number of references?