spark-cassandra-connector
DataStax Connector for Apache Spark to Apache Cassandra (by datastax)
kafka-journal
Event sourcing journal implementation using Kafka as main storage (by evolution-gaming)
spark-cassandra-connector | kafka-journal | |
---|---|---|
1 | 2 | |
1,930 | 110 | |
-0.1% | 0.9% | |
5.1 | 9.0 | |
7 days ago | 1 day ago | |
Scala | Scala | |
Apache License 2.0 | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-cassandra-connector
Posts with mentions or reviews of spark-cassandra-connector.
We have used some of these posts to build our list of alternatives
and similar projects.
-
Reading from cassandra in Spark does not return all the data when using JoinWithCassandraTable
This works perfectly fine and I get all the data I'm expecting. However if I change spark.cassandra.sql.inClauseToJoinConversionThreshold(see https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md) to something lower like 20 which means I hit the threshold (my cross-product is 10*10=100) and JoinWithCassandraTable will be used. I suddenly do not get all the data, and on top of that I get duplicated rows for some of the data. It looks like I'm completely missing some of the partition keys, and some of the partition keys return duplicated rows (this quick-analysis might however be wrong).
kafka-journal
Posts with mentions or reviews of kafka-journal.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-10-29.
-
James Roper on the future of Lagom.
The other people seem to do it even without Lightbend support, I am sure Lightbend can do it much better: https://github.com/evolution-gaming/kafka-journal
-
Streaming journals
The closest is Evolution Kafka Journal (https://github.com/evolution-gaming/kafka-journal), which is battle-tested and tuned for high load, but is barely documented and doesn't provide any tools or APIs.
What are some alternatives?
When comparing spark-cassandra-connector and kafka-journal you can also consider the following projects:
deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
akka-persistence-cassandra - A replicated Akka Persistence journal backed by Apache Cassandra
Quill - Compile-time Language Integrated Queries for Scala
Reactive-kafka - Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.
GCP Datastore Akka Persistence Plugin - akka-persistence-gcp-datastore is a journal and snapshot store plugin for akka-persistence using google cloud firestore in datastore mode.
opa-kafka-plugin - Open Policy Agent (OPA) plug-in for Kafka authorization