Python dataengineering

Open-source Python projects categorized as dataengineering

Top 9 Python dataengineering Projects

  • data-diff

    Compare tables within or across databases

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26

    If the issue happen a lot, there is also: https://github.com/datafold/data-diff

    That is a nice tool to do it cross database as well.

    I think it's based on checksum method.

  • sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

  • Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14

    There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • grai-core

  • Project mention: Launch HN: Grai (YC S22) – Open-Source Data Observability Platform | news.ycombinator.com | 2023-07-17

    Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE

  • data-engineer-challenge

    Challenge Data Engineer

  • pyDag

    Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

  • pyspark-on-aws-emr

    The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

  • metadata-guardian

    Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • livyc

    Apache Spark as a Service with Apache Livy Client

  • ticker_selection_BI_dashboard

    Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Index

What are some of the best open-source dataengineering projects in Python? This list will help you:

Project Stars
1 data-diff 2,842
2 sqlmesh 1,249
3 grai-core 269
4 data-engineer-challenge 25
5 pyDag 24
6 pyspark-on-aws-emr 24
7 metadata-guardian 18
8 livyc 3
9 ticker_selection_BI_dashboard 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com