Can we take a moment to appreciate how much of dataengineering is open source?

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/dataengineering

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • SaaSHub - Software Alternatives and Reviews
  • versatile-data-kit

    Build, run and manage your data pipelines with Python or SQL on any cloud

    If you wish to contribute, projects usually have good first issues: https://github.com/vmware/versatile-data-kit/labels/good%20first%20issue If you wish to learn, check out examples: https://github.com/vmware/versatile-data-kit/tree/main/examples

  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • dagster

    An orchestration platform for the development, production, and observation of data assets.

  • dbt-core

    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

  • Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

    It takes a village to build an open-source project. Grateful to 170+ contributors who contributed to RudderStack

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

    Here is an easy to use data pipeline tool (free) with a user friendly UI: https://github.com/mage-ai/mage-ai

  • decile

    Simple, open-source analytics tool for any Postgres database.

    If you'd like to try, feel free to have a go on our project: https://github.com/decileapp/decile

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts