Data engineering projects with template: Airflow, dbt, Docker, Terraform (IAC), Github actions (CI/CD) & more

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • data_engineering_project_template

    A template repository to create a data project with IAC, CI/CD, Data migrations, & testing

  • Docker is used to containerize your application. For e.g. this Dockerfile is used to create a container and it specifies what OS it is, etc. You can run docker on any machine and you can think of it as running a separate os (not exactly, but close enough) on the machine. What Docker provides is the ability to replicate OS & its packages (e.g. python modules) across machines so that you don't run into "hey that worked on my computer" type issues.

  • black

    The uncompromising Python code formatter

  • Formatting: isort & black

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • terraform

    Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

  • IAC: Terraform

  • Flake8

    flake8 is a python tool that glues together pycodestyle, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.

  • Lint check: flake8

  • isort

    A Python utility / library to sort imports.

  • Formatting: isort & black

  • Docker Compose

    Define and run multi-container applications with Docker

  • local development: Docker & Docker compose

  • public-api-lists

    A collective list of free APIs for use in software and web development 🚀

  • And then try out the pipeline with a data source if your choosing. I use https://github.com/public-api-lists/public-api-lists to get some data API.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts