Why we don’t use Spark

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

  • Most people working in big data know Spark (if you don't, check out their website) as the standard tool to Extract, Transform & Load (ETL) their heaps of data. Spark, the successor of Hadoop & MapReduce, works a lot like Pandas, a data science package where you run operators over collections of data. These operators then return new data collections, which allows the chaining of operators in a functional way while keeping scalability in mind.

  • nodejs-pubsub

    Node.js client for Google Cloud Pub/Sub: Ingest event streams from anywhere, at any scale, for simple, reliable, real-time stream analytics.

  • The Kafka-Dataproc combination served us well for some time, until GCP released its own message queue implementation: Google Cloud Pub/Sub. At the time, we investigated the value of switching. There is always an inherent switching cost, but what we had underestimated with Kafka is that there is a substantial overhead in maintaining the system. This is especially true if the ingested data volume increases rapidly. As an example: the Kafka service requires you to manually shard the data streams while a managed service like Pubsub does the (re)sharding behind the scenes. Pubsub on the other hand also had some downsides, e.g. it didn’t allow for longer-term data retention which can easily be worked around by storing the data on Cloud Storage after processing. Persisting the data and keeping logs on the interesting messages made Kafka obsolete for our use case.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts