Top 16 dataengineering Open-Source Projects
-
OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
-
metadata-guardian
Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️
-
ticker_selection_BI_dashboard
Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.
If the issue happen a lot, there is also: https://github.com/datafold/data-diff
That is a nice tool to do it cross database as well.
I think it's based on checksum method.
Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.
Project mention: Launch HN: Grai (YC S22) – Open-Source Data Observability Platform | news.ycombinator.com | 2023-07-17Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE
Index
What are some of the best open-source dataengineering projects? This list will help you:
Project | Stars | |
---|---|---|
1 | OpenMetadata | 4,227 |
2 | data-diff | 2,862 |
3 | sqlmesh | 1,296 |
4 | zingg | 889 |
5 | automate-dv | 459 |
6 | grai-core | 270 |
7 | snowpark-python-demos | 243 |
8 | Data-Engineering-Roadmap | 117 |
9 | apache-spark-docker | 40 |
10 | data-engineer-challenge | 25 |
11 | pyDag | 24 |
12 | pyspark-on-aws-emr | 24 |
13 | ghcn-d | 21 |
14 | metadata-guardian | 18 |
15 | livyc | 3 |
16 | ticker_selection_BI_dashboard | 2 |
Sponsored