spark-cassandra-connector
Phantom
spark-cassandra-connector | Phantom | |
---|---|---|
1 | 1 | |
1,930 | 1,047 | |
-0.1% | -0.1% | |
5.1 | 0.0 | |
7 days ago | about 1 year ago | |
Scala | Scala | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-cassandra-connector
-
Reading from cassandra in Spark does not return all the data when using JoinWithCassandraTable
This works perfectly fine and I get all the data I'm expecting. However if I change spark.cassandra.sql.inClauseToJoinConversionThreshold(see https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md) to something lower like 20 which means I hit the threshold (my cross-product is 10*10=100) and JoinWithCassandraTable will be used. I suddenly do not get all the data, and on top of that I get duplicated rows for some of the data. It looks like I'm completely missing some of the partition keys, and some of the partition keys return duplicated rows (this quick-analysis might however be wrong).
Phantom
What are some alternatives?
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Quill - Compile-time Language Integrated Queries for Scala
Slick - Slick (Scala Language Integrated Connection Kit) is a modern database query and access library for Scala
doobie - Functional JDBC layer for Scala.
Tepkin
longevity - A Persistence Framework for Scala and NoSQL
lucene4s - Light-weight convenience wrapper around Lucene to simplify complex tasks and add Scala sugar.
Troy - Type-safe and Schema-safe Scala wrapper for Cassandra driver
scredis - Non-blocking, ultra-fast Scala Redis client built on top of Akka IO, used in production at Livestream
rethink-scala - Scala Driver for RethinkDB
neotypes - Scala lightweight, type-safe, asynchronous driver for neo4j
gremlin-scala - Scala wrapper for Apache TinkerPop 3 Graph DSL