Sonar helps you commit clean code every time. With over 600 unique rules to find Java bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Hudi Alternatives
Similar projects and alternatives to hudi
-
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
Sonar
Write Clean Java Code. Always.. Sonar helps you commit clean code every time. With over 600 unique rules to find Java bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
-
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
-
debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
-
-
RocksDB
A library that provides an embeddable, persistent key-value store for fast storage.
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
Apache Orc
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
-
-
Apache Spark
Apache Spark - A unified analytics engine for large-scale data processing
-
-
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
-
-
dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
-
javalin
A simple and modern Java and Kotlin web framework [Moved to: https://github.com/javalin/javalin]
-
sqlfluff
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
-
-
-
ploomber
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
hudi reviews and mentions
-
How-to-Guide: Contributing to Open Source
Apache Hudi
-
4 best opensource projects about big data you should try out
1.Hudi
-
How Does The Data Lakehouse Enhance The Customer Data Stack?
A Lakehouse is an architecture that builds on top of the data lake concept and enhances it with functionality commonly found in database systems. The limitations of the data lake led to the emergence of a number of technologies including Apache Iceberg and Apache Hudi. These technologies define a Table Format on top of storage formats like ORC and Parquet on which additional functionality like transactions can be built.
-
SCD type 2 in spark
Use Hudi Or Delta Lake
- Would ParquetWriter from pyarrow automatically flush?
-
Apache Hudi - The Streaming Data Lake Platform
But first, we needed to tackle the basics - transactions and mutability - on the data lake. In many ways, Apache Hudi pioneered the transactional data lake movement as we know it today. Specifically, during a time when more special-purpose systems were being born, Hudi introduced a server-less, transaction layer, which worked over the general-purpose Hadoop FileSystem abstraction on Cloud Stores/HDFS. This model helped Hudi to scale writers/readers to 1000s of cores on day one, compared to warehouses which offer a richer set of transactional guarantees but are often bottlenecked by the 10s of servers that need to handle them. We also experience a lot of joy to see similar systems (Delta Lake for e.g) later adopt the same server-less transaction layer model that we originally shared way back in early '17. We consciously introduced two table types Copy On Write (with simpler operability) and Merge On Read (for greater flexibility) and now these terms are used in projects outside Hudi, to refer to similar ideas being borrowed from Hudi. Through open sourcing and graduating from the Apache Incubator, we have made some great progress elevating these ideas across the industry, as well as bringing them to life with a cohesive software stack. Given the exciting developments in the past year or so that have propelled data lakes further mainstream, we thought some perspective can help users see Hudi with the right lens, appreciate what it stands for, and be a part of where it’s headed. At this time, we also wanted to shine some light on all the great work done by 180+ contributors on the project, working with more than 2000 unique users over slack/github/jira, contributing all the different capabilities Hudi has gained over the past years, from its humble beginnings.
-
A note from our sponsor - Sonar
www.sonarsource.com | 8 Feb 2023
Stats
apache/hudi is an open source project licensed under Apache License 2.0 which is an OSI approved license.