dataengineering

Open-source projects categorized as dataengineering

Top 16 dataengineering Open-Source Projects

  • OpenMetadata

    Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

  • Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25

    In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.

  • data-diff

    Compare tables within or across databases

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26

    If the issue happen a lot, there is also: https://github.com/datafold/data-diff

    That is a nice tool to do it cross database as well.

    I think it's based on checksum method.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

  • Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14

    There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.

  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

  • automate-dv

    A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

  • grai-core

  • Project mention: Launch HN: Grai (YC S22) – Open-Source Data Observability Platform | news.ycombinator.com | 2023-07-17

    Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE

  • snowpark-python-demos

    This repository provides various demos/examples of using Snowpark for Python.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Data-Engineering-Roadmap

    Roadmap for Data Engineering

  • Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
  • apache-spark-docker

    Dockerizing an Apache Spark Standalone Cluster

  • data-engineer-challenge

    Challenge Data Engineer

  • pyDag

    Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

  • pyspark-on-aws-emr

    The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

  • ghcn-d

    Data Pipeline from the Global Historical Climatology Network DataSet

  • metadata-guardian

    Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️

  • livyc

    Apache Spark as a Service with Apache Livy Client

  • ticker_selection_BI_dashboard

    Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Index

What are some of the best open-source dataengineering projects? This list will help you:

Project Stars
1 OpenMetadata 4,227
2 data-diff 2,862
3 sqlmesh 1,296
4 zingg 889
5 automate-dv 459
6 grai-core 270
7 snowpark-python-demos 243
8 Data-Engineering-Roadmap 117
9 apache-spark-docker 40
10 data-engineer-challenge 25
11 pyDag 24
12 pyspark-on-aws-emr 24
13 ghcn-d 21
14 metadata-guardian 18
15 livyc 3
16 ticker_selection_BI_dashboard 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com