Hudi Alternatives

Similar projects and alternatives to hudi

missing-semester

375 4,694 6.8 CSS hudi VS missing-semester

The Missing Semester of Your CS Education 📚
Airflow

169 34,485 10.0 Python hudi VS Airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ploomber

121 3,369 7.8 Python hudi VS ploomber

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Apache Spark

101 38,320 10.0 Scala hudi VS Apache Spark

Apache Spark - A unified analytics engine for large-scale data processing
debezium

80 9,857 9.9 Java hudi VS debezium

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
dbt-core

86 8,881 9.7 Python hudi VS dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Apache Arrow

75 13,480 10.0 C++ hudi VS Apache Arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
delta

69 6,874 9.8 Scala hudi VS delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
versatile-data-kit

52 410 9.7 Python hudi VS versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.
Trino

44 9,552 10.0 Java hudi VS Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
RocksDB

43 27,389 9.8 C++ hudi VS RocksDB

A library that provides an embeddable, persistent key-value store for fast storage.
sqlfluff

35 7,199 9.6 Python hudi VS sqlfluff

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
nifi

35 4,381 9.9 Java hudi VS nifi

Apache NiFi
Dask

32 11,982 9.7 Python hudi VS Dask

Parallel computing with task scheduling
iceberg

18 5,508 9.9 Java hudi VS iceberg

Apache Iceberg
Apache Avro

22 2,764 9.7 Java hudi VS Apache Avro

Apache Avro is a data serialization system.
javalin

23 5,583 9.1 Kotlin hudi VS javalin

Discontinued A simple and modern Java and Kotlin web framework [Moved to: https://github.com/javalin/javalin]
pinot

15 5,119 9.9 Java hudi VS pinot

Apache Pinot - A realtime distributed OLAP datastore
kudu

3 1,799 9.2 C++ hudi VS kudu

Mirror of Apache Kudu (by apache)
dbt-expectations

10 939 6.7 Shell hudi VS dbt-expectations

Port(ish) of Great Expectations to dbt test macros
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better hudi alternative or higher similarity.

Suggest an alternative to hudi

hudi reviews and mentions

Posts with mentions or reviews of hudi. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-18.

Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
4 projects | dev.to | 18 Dec 2023

Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
The "Big Three's" Data Storage Offerings
2 projects | /r/dataengineering | 15 Jun 2023

Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
Data-eng related highlights from the latest Thoughtworks Tech Radar
3 projects | /r/dataengineering | 26 Apr 2023

Apache Hudi
For those of you with Lakehouse Architectures, how do you handle duplicate records?
1 project | /r/dataengineering | 16 Apr 2023
AWS ACID data lakehouse
1 project | /r/dataengineering | 30 Jan 2023

Try Apache Hudi, it is fully integrated with AWS and offers almost everything that you requested.
Data n00b looking for guidance on how to setup data lake/warehouse
1 project | /r/dataengineering | 29 Oct 2022

the corresponding kafka topics have 30d retention and I intend on having s3 sink connector for long term storage (open to other ideas here too, I noticed theres a hudi connector also)
apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.
1 project | /r/devopsish | 20 Oct 2022
Big Data file formats
1 project | /r/apachespark | 13 Jun 2022
How-to-Guide: Contributing to Open Source
19 projects | /r/dataengineering | 11 Jun 2022

Apache Hudi
What do you use for Data versioning?
1 project | /r/mlops | 28 Mar 2022

You could have a look at Apache Hudi - especially if you're running your Data Pipelines using Spark or Flink.
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Stats

Basic hudi repo stats

Mentions

Stars

5,053

Activity

9.9

Last Commit

7 days ago

apache/hudi is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of hudi is Java.

Popular Comparisons