Reading from cassandra in Spark does not return all the data when using JoinWithCassandraTable

This page summarizes the projects mentioned and recommended in the original post on /r/apachespark

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • spark-cassandra-connector

    DataStax Connector for Apache Spark to Apache Cassandra (by datastax)

  • This works perfectly fine and I get all the data I'm expecting. However if I change spark.cassandra.sql.inClauseToJoinConversionThreshold(see https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md) to something lower like 20 which means I hit the threshold (my cross-product is 10*10=100) and JoinWithCassandraTable will be used. I suddenly do not get all the data, and on top of that I get duplicated rows for some of the data. It looks like I'm completely missing some of the partition keys, and some of the partition keys return duplicated rows (this quick-analysis might however be wrong).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • A glimpse into the future of data processing infrastructure.

    1 project | dev.to | 2 May 2024
  • Coroutines and Effects

    3 projects | news.ycombinator.com | 21 Apr 2024
  • The dangers of single line regular expressions

    1 project | news.ycombinator.com | 22 Apr 2024
  • 1800-2023 – IEEE Standard for SystemVerilog

    1 project | news.ycombinator.com | 17 Apr 2024
  • JHipster 8 - Criando uma aplicação monolítica

    4 projects | dev.to | 11 Apr 2024