Data pipeline suggestions

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Ingestion / Extraction: Airbyte, Singer, Jitsu

  • jitsu

    Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

  • Ingestion / Extraction: Airbyte, Singer, Jitsu

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dbt-core

    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

  • Transformation: dbt

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • Orchestration: Airflow, Dagster

  • dagster

    An orchestration platform for the development, production, and observation of data assets.

  • Orchestration: Airflow, Dagster

  • great_expectations

    Always know what to expect from your data.

  • Testing: GreatExpectations

  • monosi

    Open source data observability platform

  • Observability: Monosi

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • grouparoo

    Discontinued 🦘 The Grouparoo Monorepo - open source customer data sync framework

  • Reverse ETL: Grouparoo, Castled

  • castled

    Discontinued Castled is an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams

  • Reverse ETL: Grouparoo, Castled

  • lightdash

    Self-serve BI to 10x your data team ⚡️

  • Visualization / Analysis: Lightdash, Superset

  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

  • Visualization / Analysis: Lightdash, Superset

  • ploomber

    The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

  • Check out Ploomber, (disclaimer: I'm the author) it has a simple API, and you can export to Airflow, AWS, Kubernetes. Supports all databases that work with Python and you can seamlessly transfer from a SQL step to a Python step. Here's an example.

  • projects

    Sample projects using Ploomber. (by ploomber)

  • Check out Ploomber, (disclaimer: I'm the author) it has a simple API, and you can export to Airflow, AWS, Kubernetes. Supports all databases that work with Python and you can seamlessly transfer from a SQL step to a Python step. Here's an example.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts