Datajob: Build and deploy a serverless data pipeline on AWS with no effort.

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • datajob

    Build and deploy a serverless data pipeline on AWS with no effort.

  • I have been working on Datajob, a library that helps me ship my data pipeline to AWS with at least configuration as code as possible and I'm curious if other people can use this. I have a minimal version that lets you package your code and its dependencies to AWS Glue python/pyspark jobs and orchestrates it using step functions as simple as task1 >> task2 >> task3

  • Moto

    A library that allows you to easily mock out tests based on AWS infrastructure.

  • - One way to test the functionality is to use pytest/unittest/... in combination with moto. I wrote a medium article more than a year ago that gives an example on how you can test glue pyspark jobs: https://towardsdatascience.com/testing-glue-pyspark-jobs-4b544d62106e

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • getting-started

    This repository is a getting started guide to Singer. (by singer-io)

  • If i'm not mistaken, singer.io are scripts that move data around. Datajob can help you deploy and orchestrate these singer.io scripts to AWS Glue.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts