spark-cassandra-connector vs kafka-journal

spark-cassandra-connector

DataStax Connector for Apache Spark to Apache Cassandra (by datastax)

Source Code

datastax.github.io

Suggest alternative

Edit details

kafka-journal

Event sourcing journal implementation using Kafka as main storage (by evolution-gaming)

Kafka Journal event-sourcing akka-persistence Cassandra Scala

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

spark-cassandra-connector		kafka-journal
	Project
1	Mentions	2
1,930	Stars	110
-0.1%	Growth	0.9%
5.1	Activity	9.0
7 days ago	Latest Commit	1 day ago
Scala	Language	Scala
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

spark-cassandra-connector

Posts with mentions or reviews of spark-cassandra-connector. We have used some of these posts to build our list of alternatives and similar projects.

Reading from cassandra in Spark does not return all the data when using JoinWithCassandraTable
1 project | /r/apachespark | 9 Mar 2022

This works perfectly fine and I get all the data I'm expecting. However if I change spark.cassandra.sql.inClauseToJoinConversionThreshold(see https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md) to something lower like 20 which means I hit the threshold (my cross-product is 10*10=100) and JoinWithCassandraTable will be used. I suddenly do not get all the data, and on top of that I get duplicated rows for some of the data. It looks like I'm completely missing some of the partition keys, and some of the partition keys return duplicated rows (this quick-analysis might however be wrong).

kafka-journal

Posts with mentions or reviews of kafka-journal. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-29.

James Roper on the future of Lagom.
2 projects | /r/scala | 29 Oct 2021

The other people seem to do it even without Lightbend support, I am sure Lightbend can do it much better: https://github.com/evolution-gaming/kafka-journal
Streaming journals
1 project | /r/Akka | 9 Jun 2021

The closest is Evolution Kafka Journal (https://github.com/evolution-gaming/kafka-journal), which is battle-tested and tuned for high load, but is barely documented and doesn't provide any tools or APIs.

What are some alternatives?

When comparing spark-cassandra-connector and kafka-journal you can also consider the following projects:

deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

akka-persistence-cassandra - A replicated Akka Persistence journal backed by Apache Cassandra

Quill - Compile-time Language Integrated Queries for Scala

Reactive-kafka - Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.

GCP Datastore Akka Persistence Plugin - akka-persistence-gcp-datastore is a journal and snapshot store plugin for akka-persistence using google cloud firestore in datastore mode.

opa-kafka-plugin - Open Policy Agent (OPA) plug-in for Kafka authorization

spark-cassandra-connector vs deequ kafka-journal vs akka-persistence-cassandra kafka-journal vs Quill kafka-journal vs Reactive-kafka kafka-journal vs GCP Datastore Akka Persistence Plugin kafka-journal vs opa-kafka-plugin