AWS EMR Cost Optimization Guide

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Apache Orc

    Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

    Data formatting is another place to make gains. When dealing with huge amounts of data, finding the data you need can take up a significant amount of your compute time. Apache Parquet and Apache ORC are columnar data formats optimized for analytics that pre-aggregate metadata about columns. If your EMR queries column intensive data like sum, max, or count, you can see significant speed improvements by reformatting data like CSVs into one of these columnar formats.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts