-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
This works perfectly fine and I get all the data I'm expecting. However if I change spark.cassandra.sql.inClauseToJoinConversionThreshold(see https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md) to something lower like 20 which means I hit the threshold (my cross-product is 10*10=100) and JoinWithCassandraTable will be used. I suddenly do not get all the data, and on top of that I get duplicated rows for some of the data. It looks like I'm completely missing some of the partition keys, and some of the partition keys return duplicated rows (this quick-analysis might however be wrong).