hudi VS RocksDB

Compare hudi vs RocksDB and see what are their differences.

RocksDB

A library that provides an embeddable, persistent key-value store for fast storage. (by facebook)
Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
hudi RocksDB
20 43
5,038 27,285
1.7% 1.1%
9.9 9.8
6 days ago 6 days ago
Java C++
Apache License 2.0 GNU General Public License v3.0 only
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

hudi

Posts with mentions or reviews of hudi. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-18.
  • Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
    4 projects | dev.to | 18 Dec 2023
    Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
  • The "Big Three's" Data Storage Offerings
    2 projects | /r/dataengineering | 15 Jun 2023
    Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
  • Data-eng related highlights from the latest Thoughtworks Tech Radar
    3 projects | /r/dataengineering | 26 Apr 2023
    Apache Hudi
  • How-to-Guide: Contributing to Open Source
    19 projects | /r/dataengineering | 11 Jun 2022
    Apache Hudi
  • 4 best opensource projects about big data you should try out
    4 projects | dev.to | 24 Mar 2022
    1.Hudi
  • How Does The Data Lakehouse Enhance The Customer Data Stack?
    3 projects | dev.to | 31 Jan 2022
    A Lakehouse is an architecture that builds on top of the data lake concept and enhances it with functionality commonly found in database systems. The limitations of the data lake led to the emergence of a number of technologies including Apache Iceberg and Apache Hudi. These technologies define a Table Format on top of storage formats like ORC and Parquet on which additional functionality like transactions can be built.
  • SCD type 2 in spark
    2 projects | /r/dataengineering | 15 Oct 2021
    Use Hudi Or Delta Lake
  • Would ParquetWriter from pyarrow automatically flush?
    4 projects | /r/learnpython | 11 Sep 2021
  • Apache Hudi - The Streaming Data Lake Platform
    8 projects | dev.to | 27 Jul 2021
    But first, we needed to tackle the basics - transactions and mutability - on the data lake. In many ways, Apache Hudi pioneered the transactional data lake movement as we know it today. Specifically, during a time when more special-purpose systems were being born, Hudi introduced a server-less, transaction layer, which worked over the general-purpose Hadoop FileSystem abstraction on Cloud Stores/HDFS. This model helped Hudi to scale writers/readers to 1000s of cores on day one, compared to warehouses which offer a richer set of transactional guarantees but are often bottlenecked by the 10s of servers that need to handle them. We also experience a lot of joy to see similar systems (Delta Lake for e.g) later adopt the same server-less transaction layer model that we originally shared way back in early '17. We consciously introduced two table types Copy On Write (with simpler operability) and Merge On Read (for greater flexibility) and now these terms are used in projects outside Hudi, to refer to similar ideas being borrowed from Hudi. Through open sourcing and graduating from the Apache Incubator, we have made some great progress elevating these ideas across the industry, as well as bringing them to life with a cohesive software stack. Given the exciting developments in the past year or so that have propelled data lakes further mainstream, we thought some perspective can help users see Hudi with the right lens, appreciate what it stands for, and be a part of where it’s headed. At this time, we also wanted to shine some light on all the great work done by 180+ contributors on the project, working with more than 2000 unique users over slack/github/jira, contributing all the different capabilities Hudi has gained over the past years, from its humble beginnings.

RocksDB

Posts with mentions or reviews of RocksDB. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-28.
  • How to choose the right type of database
    15 projects | dev.to | 28 Feb 2024
    RocksDB: A high-performance embedded database optimized for multi-core CPUs and fast storage like SSDs. Its use of a log-structured merge-tree (LSM tree) makes it suitable for applications requiring high throughput and efficient storage, such as streaming data processing.
  • Fast persistent recoverable log and key-value store
    3 projects | news.ycombinator.com | 24 Feb 2024
    [RocksDB](https://rocksdb.org/) isn’t a distributed storage system, fwiw. It’s an embedded KV engine similar to LevelDB, LMDB, or really sqlite (though that’s full SQL, not just KV)
  • The Hallucinated Rows Incident
    2 projects | dev.to | 23 Nov 2023
    To output the top 3 rocks, our engine has to first store all the rocks in some sorted way. To do this, we of course picked RocksDB, an embedded lexicographically sorted key-value store, which acts as the sorting operation's persistent state. In our RocksDB state, the diffs are keyed by the value of weight, and since RocksDB is sorted, our stored diffs are automatically sorted by their weight.
  • In-memory vs. disk-based databases: Why do you need a larger than memory architecture?
    3 projects | dev.to | 5 Sep 2023
    Memgraph uses RocksDB as a key-value store for extending the capabilities of the in-memory database. Not to go into too many details about RocksDB, but let’s just briefly mention that it is based on a data structure called Log-Structured Merge-Tree (LSMT) (instead of B-Trees, typically the default option in databases), which are saved on disk and because of the design come with a much smaller write amplification than B-Trees.
    3 projects | dev.to | 5 Sep 2023
    The in-memory version of Memgraph uses Delta storage to support multi-version concurrency control (MVCC). However, for larger-than-memory storage, we decided to use the Optimistic Concurrency Control Protocol (OCC) since we assumed conflicts would rarely happen, and we could make use of RocksDB’s transactions without dealing with the custom layer of complexity like in the case of Delta storage.
  • How RocksDB Works
    2 projects | news.ycombinator.com | 19 Apr 2023
    Tuning RocksDB well is a very very hard challenge, and one that I am happy to not do day to day anymore. RocksDB is very powerful but it comes with other very sharp edges. Compaction is one of those, and all answers are likely workload dependent.

    If you are worried about write amplification then leveled compactions are sub-optimal. I would try the universal compaction.

    - https://github.com/facebook/rocksdb/wiki/Universal-Compactio...

  • What are the advantages of using Rust to develop KV databases?
    2 projects | /r/rust | 22 Mar 2023
    It's fairly challenging to write a KV database, and takes several years of development to get the balance right between performance and reliability and avoiding data loss. Maybe read through the documentation for RocksDB https://github.com/facebook/rocksdb/wiki/RocksDB-Overview and watch the video on why it was developed and that may give you an impression of what is involved.
  • We’re the Meilisearch team! To celebrate v1.0 of our open-source search engine, Ask us Anything!
    14 projects | /r/rust | 8 Feb 2023
    LMDB is much more sain in the sense that it supports real ACID transactions instead of savepoints for RocksDB. The latter is heavy and consumes a lot more memory for a lot less read throughput. However, RocksDB has a much better parallel and concurrent write story, where you can merge entries with merge functions and therefore write from multiple CPUs.
  • Google's OSS-Fuzz expands fuzz-reward program to $30000
    3 projects | news.ycombinator.com | 2 Feb 2023
  • Event streaming in .Net with Kafka
    4 projects | dev.to | 3 Jan 2023
    Streamiz wrap a consumer, a producer, and execute the topology for each record consumed in the source topic. You can easily create stateless and stateful application. By default, each state store is a RocksDb state store persisted on disk.

What are some alternatives?

When comparing hudi and RocksDB you can also consider the following projects:

LevelDB - LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LMDB - Read-only mirror of official repo on openldap.org. Issues and pull requests here are ignored. Use OpenLDAP ITS for issues.

iceberg - Apache Iceberg

SQLite - Unofficial git mirror of SQLite sources (see link for build instructions)

sled - the champagne of beta embedded databases

ClickHouse - ClickHouse® is a free analytics DBMS for big data

TileDB - The Universal Storage Engine

kudu - Mirror of Apache Kudu

libmdbx - One of the fastest embeddable key-value ACID database without WAL. libmdbx surpasses the legendary LMDB in terms of reliability, features and performance.

SQLite - Official Git mirror of the SQLite source tree

Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.