Upload to S3 -> AWS lambda with some Scala Spark code -> Process -> Write back to S3

This page summarizes the projects mentioned and recommended in the original post on /r/scala

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • s3-sqs-connector

    A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).

  • Are you planning on uploading and processing many files to S3? If so I would use something like Structured Streaming with the FileSource which can detect new files uploaded to S3 and process them in on a "standard" Spark cluster. You can then build a very easy to deploy and operate cluster on EKS/Kubernetes. I would check out: https://github.com/qubole/s3-sqs-connector once the number of files you upload start to get really large. Glue could also be used to achieve roughly the same thing and without the hassle of managing the EKS/K8s clusters.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts