Apache Spark Alternatives

Similar projects and alternatives to Apache Spark

Visual Studio Code

2,838 158,095 10.0 TypeScript Apache Spark VS Visual Studio Code

Visual Studio Code
kubernetes

657 106,611 10.0 Go Apache Spark VS kubernetes

Production-Grade Container Scheduling and Management
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Pandas

393 41,923 10.0 Python Apache Spark VS Pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Redis

318 64,705 9.7 C Apache Spark VS Redis

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
Stripe

300 3,596 8.9 PHP Apache Spark VS Stripe

PHP library for the Stripe API.
the-algorithm

265 10 10.0 Apache Spark VS the-algorithm
ClickHouse

208 34,054 10.0 C++ Apache Spark VS ClickHouse

ClickHouse® is a free analytics DBMS for big data
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Airflow

169 34,397 10.0 Python Apache Spark VS Airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
examples

143 7,742 6.2 Jupyter Notebook Apache Spark VS examples

TensorFlow examples (by tensorflow)
ApacheKafka

104 28 0.0 Apache Spark VS ApacheKafka

A curated re-sources list for awesome Apache Kafka
elasticsearch-mapper-attachments

102 503 0.0 Java Apache Spark VS elasticsearch-mapper-attachments

Discontinued Mapper Attachments Type plugin for Elasticsearch
Apache Arrow

75 13,480 10.0 C++ Apache Spark VS Apache Arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
redpanda

69 8,784 10.0 C++ Apache Spark VS redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
delta

69 6,874 9.8 Scala Apache Spark VS delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
Trino

44 9,552 10.0 Java Apache Spark VS Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Apache Cassandra

35 8,507 9.9 Java Apache Spark VS Apache Cassandra

Mirror of Apache Cassandra
Apache Hadoop

26 14,301 9.9 Java Apache Spark VS Apache Hadoop

Apache Hadoop
Apache Avro

22 2,756 9.7 Java Apache Spark VS Apache Avro

Apache Avro is a data serialization system.
flink-statefun

18 491 5.1 Java Apache Spark VS flink-statefun

Apache Flink Stateful Functions
Apache Hive

14 5,320 9.6 Java Apache Spark VS Apache Hive

Apache Hive
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better Apache Spark alternative or higher similarity.

Suggest an alternative to Apache Spark

Apache Spark reviews and mentions

Posts with mentions or reviews of Apache Spark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-11.

"xAI will open source Grok"
3 projects | news.ycombinator.com | 11 Mar 2024
Groovy 🎷 Cheat Sheet - 01 Say "Hello" from Groovy
7 projects | dev.to | 7 Mar 2024

Recently I had to revisit the "JVM languages universe" again. Yes, language(s), plural! Java isn't the only language that uses the JVM. I previously used Scala, which is a JVM language, to use Apache Spark for Data Engineering workloads, but this is for another post 😉.
🦿🛴Smarcity garbage reporting automation w/ ollama
6 projects | dev.to | 31 Jan 2024

Consume data into third party software (then let Open Search or Apache Spark or Apache Pinot) for analysis/datascience, GIS systems (so you can put reports on a map) or any ticket management system
Go concurrency simplified. Part 4: Post office as a data pipeline
5 projects | dev.to | 21 Dec 2023

also, this knowledge applies to learning more about data engineering, as this field of software engineering relies heavily on the event-driven approach via tools like Spark, Flink, Kafka, etc.
Five Apache projects you probably didn't know about
8 projects | dev.to | 21 Dec 2023

Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.
Apache Spark VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
Integrate Pyspark Structured Streaming with confluent-kafka
2 projects | dev.to | 12 Aug 2023

Apache Spark - https://spark.apache.org/
Spark – A micro framework for creating web applications in Kotlin and Java
1 project | news.ycombinator.com | 16 Jun 2023

A JVM based framework named "Spark", when https://spark.apache.org exists?
Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
4 projects | news.ycombinator.com | 4 May 2023
PySpark SparkSession Builder with Kubernetes Master
1 project | /r/codehunter | 20 Apr 2023

I recently saw a pull request that was merged to the Apache/Spark repository that apparently adds initial Python bindings for PySpark on K8s. I posted a comment to the PR asking a question about how to use spark-on-k8s in a Python Jupyter notebook, and was told to ask my question here.
A note from our sponsor - InfluxDB
www.influxdata.com | 24 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Stats

Basic Apache Spark repo stats

Mentions

101

Stars

38,320

Activity

10.0

Last Commit

4 days ago

apache/spark is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of Apache Spark is Scala.

Popular Comparisons