Hudi Alternatives

Similar projects and alternatives to hudi

  • iceberg

    hudi VS iceberg

    Apache Iceberg

  • kudu

    hudi VS kudu

    Mirror of Apache Kudu (by apache)

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • Trino

    hudi VS Trino

    Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

  • debezium

    hudi VS debezium

    Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

  • RocksDB

    hudi VS RocksDB

    A library that provides an embeddable, persistent key-value store for fast storage.

  • delta

    hudi VS delta

    An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads. (by delta-io)

  • javalin

    hudi VS javalin

    A simple and modern Java and Kotlin web framework

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • Apache Avro

    hudi VS Apache Avro

    Apache Avro is a data serialization system.

  • Dask

    hudi VS Dask

    Parallel computing with task scheduling

  • lambda-arch

    hudi VS lambda-arch

    Applying Lambda Architecture with Spark, Kafka, and Cassandra.

  • Apache Orc

    hudi VS Apache Orc

    Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better hudi alternative or higher similarity.

Suggest an alternative to hudi

Reviews and mentions

Posts with mentions or reviews of hudi. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-15.
  • Apache Hudi and Glue Catalog
    1 project | reddit.com/r/aws | 1 Nov 2021
    Found this very similar discussion, differing only in that it's EMR which is what Glue is in the background anyway. Logs attached in the discussion are showing an error thrown by Glue when Hudi tries to perform an ALTER TABLE CASCADE because Glue metastore doesn't support cascade. The linked Jira in the discussion claims that more recent EMR versions resolve this issue but there are comments as of May this year claiming it's still unresolved. It's unclear to me which EMR version Glue 2.0 uses. I'll dig into this tomorrow and also try with Glue 3.0 as that may use a more recent version of EMR that's resolved the issue.
  • SCD type 2 in spark
    2 projects | reddit.com/r/dataengineering | 15 Oct 2021
    Use Hudi Or Delta Lake
  • Updating Partition Values With Apache Hudi
    1 project | dev.to | 23 Sep 2021
    If you're not familiar with Apache Hudi, it's a pretty awesome piece of software that brings transactions and record-level updates/deletes to data lakes.
  • Would ParquetWriter from pyarrow automatically flush?
    4 projects | reddit.com/r/learnpython | 11 Sep 2021
  • Reliable ingestion from AWS S3 using Hudi
    1 project | dev.to | 2 Sep 2021
    In this post we will talk about a new deltastreamer source which reliably and efficiently processes new data files as they arrive in AWS S3. As of today, to ingest data from S3 into Hudi, users leverage DFS source whose path selector would identify the source files modified since the last checkpoint based on max modification time.
  • Apache Hudi - The Streaming Data Lake Platform
    8 projects | dev.to | 27 Jul 2021
    But first, we needed to tackle the basics - transactions and mutability - on the data lake. In many ways, Apache Hudi pioneered the transactional data lake movement as we know it today. Specifically, during a time when more special-purpose systems were being born, Hudi introduced a server-less, transaction layer, which worked over the general-purpose Hadoop FileSystem abstraction on Cloud Stores/HDFS. This model helped Hudi to scale writers/readers to 1000s of cores on day one, compared to warehouses which offer a richer set of transactional guarantees but are often bottlenecked by the 10s of servers that need to handle them. We also experience a lot of joy to see similar systems (Delta Lake for e.g) later adopt the same server-less transaction layer model that we originally shared way back in early '17. We consciously introduced two table types Copy On Write (with simpler operability) and Merge On Read (for greater flexibility) and now these terms are used in projects outside Hudi, to refer to similar ideas being borrowed from Hudi. Through open sourcing and graduating from the Apache Incubator, we have made some great progress elevating these ideas across the industry, as well as bringing them to life with a cohesive software stack. Given the exciting developments in the past year or so that have propelled data lakes further mainstream, we thought some perspective can help users see Hudi with the right lens, appreciate what it stands for, and be a part of where it’s headed. At this time, we also wanted to shine some light on all the great work done by 180+ contributors on the project, working with more than 2000 unique users over slack/github/jira, contributing all the different capabilities Hudi has gained over the past years, from its humble beginnings.
  • CRUD operations on a large partitioned parquet dataset using Spark
    1 project | reddit.com/r/dataengineering | 17 Jul 2021
    Yes, or Apache Hudi: https://hudi.apache.org/
  • Optimize Data Lake layout using Clustering in Apache Hudi
    1 project | dev.to | 27 Jan 2021
    Finally, the clustering plan is saved to the timeline in an avro metadata format.

Stats

Basic hudi repo stats
8
2,702
9.9
5 days ago

apache/hudi is an open source project licensed under Apache License 2.0 which is an OSI approved license.

OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
github.com/nanovms
Find remote jobs at our new job board 99remotejobs.com. There are 30 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.