Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 11 Jupyter Notebook data-engineering Projects
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
data-engineering-zoomcamp
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Project mention: Study Notes 2.2.7: Managing Schedules and Backfills with BigQuery in Kestra | dev.to | 2025-02-04DE Zoomcamp Resources: Data Engineering Zoomcamp
-
-
hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Project mention: Show HN: I built an open-source data pipeline tool in Go | news.ycombinator.com | 2024-12-17I always thought Hamilton [1] does a good job of giving enough visual hooks that draw you in.
I also noticed this pattern where library authors sometimes do a bit extra in terms of discussing and even promoting their competitors, and it makes me trust them more. A “heres why ours is better and everyone else sucks …” section always comes across as the infomercial character who is having quite a hard time peeling an apple to the point you wonder if this the first time they’ve used hands.
One thing wish for is a tool that’s essentially just Celery that doesn’t require a message broker (and can just use a database), and which is supported on Windows. There’s always a handful of edge cases where we’re pulling data from an old 32-bit system on Windows. And basically every system has some not-quite-ergonomic workaround that’s as much work as if you’d just built it yourself.
It seems like it’s just sending a JSON message over a queue or HTTP API and the worker receives it and runs the task. Maybe it’s way harder than I’m envisioning (but I don’t think so because I’ve already written most of it).
I guess that’s one thing I’m not clear on with Bruin, can I run workers if different physical locations and have them carry out the tasks in the right order? Or is this more of a centralized thing (meaning even if its K8s or Dask or Ray, those are all run in a cluster which happens to be distributed, but they’re all machines sitting in the same subnet, which isn’t the definition of a “distributed task” I’m going for.
[1] https://github.com/DAGWorks-Inc/hamilton
-
-
Project mention: Show HN: Hands-On Data Engineering with a Real-Estate Project Guide | news.ycombinator.com | 2024-03-20
-
uber-expenses-tracking
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
pyspark-tutorial
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites. (by coder2j)
-
-
-
Jupyter Notebook data-engineering discussion
Jupyter Notebook data-engineering related posts
-
Study Notes 2.2.7: Managing Schedules and Backfills with BigQuery in Kestra
-
Study Note DE Zoomcamp 1.2.4 - Dockerizing the Ingestion Script
-
Data Engineering Zoomcamp 2025 Cohort: Introduction - Self-Study Notes
-
Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines
-
Data Engineering Zoomcamp Week 6 - using redpanda 1
-
Final project part 5
-
Show HN: Hands-On Data Engineering with a Real-Estate Project Guide
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 7 Feb 2025
Index
What are some of the best open-source data-engineering projects in Jupyter Notebook? This list will help you:
# | Project | Stars |
---|---|---|
1 | Made-With-ML | 38,126 |
2 | data-engineering-zoomcamp | 28,555 |
3 | mlops-course | 3,029 |
4 | hamilton | 2,007 |
5 | Data-Engineering-Projects | 896 |
6 | practical-data-engineering | 600 |
7 | uber-expenses-tracking | 116 |
8 | pyspark-tutorial | 88 |
9 | 60-Days-of-Data-Science-and-ML | 25 |
10 | Data-Engineering-Portfolio | 14 |
11 | data-engineering-nd | 9 |