How Does The Data Lakehouse Enhance The Customer Data Stack?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Trino

44 9,552 10.0 Java

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Processing has also evolved since Hadoop. First, we had the introduction of Spark that offered an API for Map-Reduce that was more user-friendly, and then we got distributed query engines like Trino. These two processing frameworks co-exist most of the time, addressing different needs. Trino is mainly used for analytical online queries where latency is important while Spark is heavily used for bigger workloads (think ETL) where the volume of data is much bigger and latency is not so important.

Apache Spark

101 38,320 10.0 Scala

Apache Spark - A unified analytics engine for large-scale data processing

Processing has also evolved since Hadoop. First, we had the introduction of Spark that offered an API for Map-Reduce that was more user-friendly, and then we got distributed query engines like Trino. These two processing frameworks co-exist most of the time, addressing different needs. Trino is mainly used for analytical online queries where latency is important while Spark is heavily used for bigger workloads (think ETL) where the volume of data is much bigger and latency is not so important.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
hudi

20 5,053 9.9 Java

Upserts, Deletes And Incremental Processing on Big Data.

A Lakehouse is an architecture that builds on top of the data lake concept and enhances it with functionality commonly found in database systems. The limitations of the data lake led to the emergence of a number of technologies including Apache Iceberg and Apache Hudi. These technologies define a Table Format on top of storage formats like ORC and Parquet on which additional functionality like transactions can be built.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project