The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Apache Spark Alternatives
Similar projects and alternatives to Apache Spark
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
-
-
Pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
-
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
-
-
-
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
-
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
-
-
Redis
Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
-
-
redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
-
-
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Apache Spark reviews and mentions
- "xAI will open source Grok"
-
Groovy 🎷 Cheat Sheet - 01 Say "Hello" from Groovy
Recently I had to revisit the "JVM languages universe" again. Yes, language(s), plural! Java isn't the only language that uses the JVM. I previously used Scala, which is a JVM language, to use Apache Spark for Data Engineering workloads, but this is for another post 😉.
-
🦿🛴Smarcity garbage reporting automation w/ ollama
Consume data into third party software (then let Open Search or Apache Spark or Apache Pinot) for analysis/datascience, GIS systems (so you can put reports on a map) or any ticket management system
-
Go concurrency simplified. Part 4: Post office as a data pipeline
also, this knowledge applies to learning more about data engineering, as this field of software engineering relies heavily on the event-driven approach via tools like Spark, Flink, Kafka, etc.
-
Five Apache projects you probably didn't know about
Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.
-
Apache Spark VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Integrate Pyspark Structured Streaming with confluent-kafka
Apache Spark - https://spark.apache.org/
- Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
-
Gotta write this on my resume
So for example contributing to say spark may better for experience(and resume) than Twitter-the algorithm.
-
Query Real Time Data in Kafka Using SQL
Additionally, one of the challenges of working with Kafka is how to efficiently analyze and extract insights from the large volumes of data stored in Kafka topics. Traditional batch processing approaches, such as Hadoop MapReduce or Apache Spark, can be slow and expensive, and may not be suitable for real-time analytics. To address this challenge, you can use SQL queries with Kafka to analyze and extract insights from the data in real time.
-
A note from our sponsor - WorkOS
workos.com | 29 Mar 2024
Stats
apache/spark is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of Apache Spark is Scala.