Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • distributed-joins

    Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB

  • This data model and queries can be readily instantiated in any relational database, including Google Cloud Spanner (see this SQL script as an example), but that would not result in the join optimizations we are looking to implement. We show how to do much better in the next two sections.

  • Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

  • Shuffle and broadcast joins are more suitable for batch or near real-time analytics. For example, they are used in Apache Spark as the main join strategies. Co-located and pre-computed joins are faster and can be used for online transaction processing with real-time applications. They frequently rely on organizing data based on unique storage schemes supported by a database.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Apache Cassandra

    Mirror of Apache Cassandra

  • To try this example in DataStax Astra DB, we share our CQL script. If you are new to CQL, it stands for Cassandra Query Language and is used in both DataStax Astra DB and Apache Cassandra. Astra DB is a serverless and multi-region database service that is based on Apache Cassandra, an open-source NoSQL database. To learn more about CQL and tables with multi-row partitions, the hands-on Cassandra Fundamentals learning series is highly recommended. For more advanced data modeling, there is also the collection of data modeling examples from various domains.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts