Jupyter Notebook data-engineering

Open-source Jupyter Notebook projects categorized as data-engineering

Top 11 Jupyter Notebook data-engineering Projects

data-engineering
  1. Made-With-ML

    Learn how to design, develop, deploy and iterate on production-grade ML applications.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. data-engineering-zoomcamp

    Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.

    Project mention: Study Notes 2.2.7: Managing Schedules and Backfills with BigQuery in Kestra | dev.to | 2025-02-04

    DE Zoomcamp Resources: Data Engineering Zoomcamp

  4. mlops-course

    Learn how to design, develop, deploy and iterate on production-grade ML applications.

  5. hamilton

    Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

    Project mention: Show HN: I built an open-source data pipeline tool in Go | news.ycombinator.com | 2024-12-17

    I always thought Hamilton [1] does a good job of giving enough visual hooks that draw you in.

    I also noticed this pattern where library authors sometimes do a bit extra in terms of discussing and even promoting their competitors, and it makes me trust them more. A “heres why ours is better and everyone else sucks …” section always comes across as the infomercial character who is having quite a hard time peeling an apple to the point you wonder if this the first time they’ve used hands.

    One thing wish for is a tool that’s essentially just Celery that doesn’t require a message broker (and can just use a database), and which is supported on Windows. There’s always a handful of edge cases where we’re pulling data from an old 32-bit system on Windows. And basically every system has some not-quite-ergonomic workaround that’s as much work as if you’d just built it yourself.

    It seems like it’s just sending a JSON message over a queue or HTTP API and the worker receives it and runs the task. Maybe it’s way harder than I’m envisioning (but I don’t think so because I’ve already written most of it).

    I guess that’s one thing I’m not clear on with Bruin, can I run workers if different physical locations and have them carry out the tasks in the right order? Or is this more of a centralized thing (meaning even if its K8s or Dask or Ray, those are all run in a cluster which happens to be distributed, but they’re all machines sitting in the same subnet, which isn’t the definition of a “distributed task” I’m going for.

    [1] https://github.com/DAGWorks-Inc/hamilton

  6. Data-Engineering-Projects

    Personal Data Engineering Projects

  7. practical-data-engineering

    Practical Data Engineering: A Hands-On Real-Estate Project Guide

    Project mention: Show HN: Hands-On Data Engineering with a Real-Estate Project Guide | news.ycombinator.com | 2024-03-20
  8. uber-expenses-tracking

    The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. pyspark-tutorial

    PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites. (by coder2j)

  11. 60-Days-of-Data-Science-and-ML

    60 Days of Data Science and ML

  12. Data-Engineering-Portfolio

    I'm learning how to build data pipelines to work with large datasets. (:

  13. data-engineering-nd

    Projects of the Udacity Data Engineering Nanodegree Program.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook data-engineering discussion

Log in or Post with

Jupyter Notebook data-engineering related posts

  • Study Notes 2.2.7: Managing Schedules and Backfills with BigQuery in Kestra

    3 projects | dev.to | 4 Feb 2025
  • Study Note DE Zoomcamp 1.2.4 - Dockerizing the Ingestion Script

    1 project | dev.to | 4 Feb 2025
  • Data Engineering Zoomcamp 2025 Cohort: Introduction - Self-Study Notes

    1 project | dev.to | 25 Jan 2025
  • Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines

    1 project | news.ycombinator.com | 2 May 2024
  • Data Engineering Zoomcamp Week 6 - using redpanda 1

    1 project | dev.to | 9 Apr 2024
  • Final project part 5

    1 project | dev.to | 3 Apr 2024
  • Show HN: Hands-On Data Engineering with a Real-Estate Project Guide

    1 project | news.ycombinator.com | 20 Mar 2024
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 7 Feb 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source data-engineering projects in Jupyter Notebook? This list will help you:

# Project Stars
1 Made-With-ML 38,126
2 data-engineering-zoomcamp 28,555
3 mlops-course 3,029
4 hamilton 2,007
5 Data-Engineering-Projects 896
6 practical-data-engineering 600
7 uber-expenses-tracking 116
8 pyspark-tutorial 88
9 60-Days-of-Data-Science-and-ML 25
10 Data-Engineering-Portfolio 14
11 data-engineering-nd 9

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Jupyter Notebook is
the 13th most popular programming language
based on number of references?