Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I have been working on Datajob, a library that helps me ship my data pipeline to AWS with at least configuration as code as possible and I'm curious if other people can use this. I have a minimal version that lets you package your code and its dependencies to AWS Glue python/pyspark jobs and orchestrates it using step functions as simple as task1 >> task2 >> task3
- One way to test the functionality is to use pytest/unittest/... in combination with moto. I wrote a medium article more than a year ago that gives an example on how you can test glue pyspark jobs: https://towardsdatascience.com/testing-glue-pyspark-jobs-4b544d62106e
If i'm not mistaken, singer.io are scripts that move data around. Datajob can help you deploy and orchestrate these singer.io scripts to AWS Glue.