Our great sponsors
-
s3-sqs-connector
A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Are you planning on uploading and processing many files to S3? If so I would use something like Structured Streaming with the FileSource which can detect new files uploaded to S3 and process them in on a "standard" Spark cluster. You can then build a very easy to deploy and operate cluster on EKS/Kubernetes. I would check out: https://github.com/qubole/s3-sqs-connector once the number of files you upload start to get really large. Glue could also be used to achieve roughly the same thing and without the hassle of managing the EKS/K8s clusters.
Related posts
- Provide maximum flexibility to your data team Author, schedule, and monitor data pipelines faster at scale on any cloud with the data processing engine of your choice with Qubole.
- Want to deliver Big Data Projects without a big price tag? Switch to Qubole to reduce your data lake cloud computing costs by 50%.
- Provide maximum flexibility to your data team Author, schedule, and monitor data pipelines faster at scale on any cloud with the data processing engine of your choice with Qubole.
- Struggling to install, configure and maintain huge data clusters? Get a single experience across any cloud with near-zero administration and maintenance with Qubole.
- Say goodbye to data silos Explore Qubole’s open, and secure multi-cloud data lake to get faster access to petabytes of datasets