How does AWS Athena manage to load 10GB/s from s3? I've managed 230 mb/s from a c6gn.16xlarge

This page summarizes the projects mentioned and recommended in the original post on /r/aws

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Trino

    Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

  • Checkout https://trino.io (formerly Presto) but is what Athena is based off of. Essentially parallelism allows for this so there’s many worker nodes all reading from S3. You can also run Presto on EMR which is sort of fun looking at the admin UI because it will show you how it breaks the query into parts and fans the work out to worker nodes. Pretty cool because if allowed to (from a resource management perspective), Presto will try to saturate the entire cluster CPU resources to compete the query as fast as possible.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts