debezium VS RocksDB

Compare debezium vs RocksDB and see what are their differences.

debezium

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ. (by debezium)

RocksDB

A library that provides an embeddable, persistent key-value store for fast storage. (by facebook)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
debezium RocksDB
80 43
9,774 27,203
2.4% 1.2%
9.9 9.8
4 days ago about 16 hours ago
Java C++
Apache License 2.0 GNU General Public License v3.0 only
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

debezium

Posts with mentions or reviews of debezium. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-10.
  • Choosing Between a Streaming Database and a Stream Processing Framework in Python
    10 projects | dev.to | 10 Feb 2024
    They manage data in the application layer and your original data stays where it is. This way data consistency is no longer an issue as it was with streaming databases. You can use Change Data Capture (CDC) services like Debezium by directly connecting to your primary database, doing computational work, and saving the result back or sending real-time data to output streams.
  • Generating Avro Schemas from Go types
    4 projects | dev.to | 14 Jan 2024
    Both of these articles mention a key player, Debezium. In fact, Debezium has had a place in the modern infrastructure. Let's use a diagram to understand why.
  • debezium VS quix-streams - a user suggested alternative
    2 projects | 7 Dec 2023
  • All the ways to capture changes in Postgres
    12 projects | news.ycombinator.com | 22 Sep 2023
  • Real-time Data Processing Pipeline With MongoDB, Kafka, Debezium And RisingWave
    3 projects | dev.to | 18 Jul 2023
    Debezium
  • How to Listen to Database Changes Using Postgres Triggers in Elixir
    10 projects | news.ycombinator.com | 14 Jun 2023
  • What are your favorite tools or components in the Kafka ecosystem?
    10 projects | /r/apachekafka | 31 May 2023
    Debezium: https://debezium.io/ (connector for cdc)
  • [Need feedback] I wrote a guide about the fundamentals of BigQuery for software developers & traditional database users
    4 projects | /r/dataengineering | 14 Apr 2023
    You don't want to couple your analytics database with your app. The only time this makes sense is when you're building small projects. When you have very high traffic, this method will break. Just stick to CDC. Look into tools like debezium if your team is concerned with sending raw data to the cloud.
  • How Change Data Capture (CDC) Works with Streaming Database
    5 projects | dev.to | 7 Apr 2023
    If you’re already using Debezium to extract CDC logs into Kafka, you can just set up RisingWave to consume changes from that Kafka topic. In this case, Kafka acts like a hub of CDC data, and beside RisingWave, other downstream systems like search index or data warehouses can consume changes as well.
  • PostgreSQL Logical Replication Explained
    4 projects | news.ycombinator.com | 18 Mar 2023
    Logical replication is also great for replicating to other systems - for example Debezium [1] that writes all changes to a Kafka stream.

    I'm using it to develop a system to replicate data to in-app SQLite databases, via an in-between storage layer [2]. Logical replication is quite a low-level tool with many tricky cases, which can be difficult to handle when integrating with it directly.

    Some examples:

    1. Any value over 8KB compressed (configurable) is stored separately from the rest of the row (TOAST storage), and unchanged values included in the replicated record by default. You need to keep track of old values in the external system, or use REPLICA IDENTITY FULL (which adds a lot of overhead on the source database).

    2. PostgreSQL's primary keys can be pretty-much any combination of columns, and may or may not be used as the table's replica identity, and it may change at any time. If "REPLICA IDENTITY FULL" is used, you don't even have an explicit primary key on the receiver side - the entire record is considered the identity. Or with "REPLICA IDENTITY NOTHING", there is no identity - every operation is treated as an insert. The replica identity is global per table, so if logical replication is used to replicate to multiple systems, you may not have full control over it. This means many different combinations of replica identity needs to be handled.

    3. For initial sync you need to read the tables directly. It takes extra effort to make sure these are replicated in the same way as with incremental replication - for example taking into account the list of published tables, replica identity, row filters and column lists.

    4. Depending on what is used for high availability, replication slots may get lost in a fail-over event, meaning you'll have to re-sync all data from scratch. This includes cases where physical or logical replication is used. The only case where this is not an issue is where the underlying block storage is replicated, which is the case in AWS RDS for example.

    [1]: https://debezium.io

    [2]: https://powersync.co

RocksDB

Posts with mentions or reviews of RocksDB. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-28.
  • How to choose the right type of database
    15 projects | dev.to | 28 Feb 2024
    RocksDB: A high-performance embedded database optimized for multi-core CPUs and fast storage like SSDs. Its use of a log-structured merge-tree (LSM tree) makes it suitable for applications requiring high throughput and efficient storage, such as streaming data processing.
  • Fast persistent recoverable log and key-value store
    3 projects | news.ycombinator.com | 24 Feb 2024
    [RocksDB](https://rocksdb.org/) isn’t a distributed storage system, fwiw. It’s an embedded KV engine similar to LevelDB, LMDB, or really sqlite (though that’s full SQL, not just KV)
  • The Hallucinated Rows Incident
    2 projects | dev.to | 23 Nov 2023
    To output the top 3 rocks, our engine has to first store all the rocks in some sorted way. To do this, we of course picked RocksDB, an embedded lexicographically sorted key-value store, which acts as the sorting operation's persistent state. In our RocksDB state, the diffs are keyed by the value of weight, and since RocksDB is sorted, our stored diffs are automatically sorted by their weight.
  • In-memory vs. disk-based databases: Why do you need a larger than memory architecture?
    3 projects | dev.to | 5 Sep 2023
    Memgraph uses RocksDB as a key-value store for extending the capabilities of the in-memory database. Not to go into too many details about RocksDB, but let’s just briefly mention that it is based on a data structure called Log-Structured Merge-Tree (LSMT) (instead of B-Trees, typically the default option in databases), which are saved on disk and because of the design come with a much smaller write amplification than B-Trees.
    3 projects | dev.to | 5 Sep 2023
    The in-memory version of Memgraph uses Delta storage to support multi-version concurrency control (MVCC). However, for larger-than-memory storage, we decided to use the Optimistic Concurrency Control Protocol (OCC) since we assumed conflicts would rarely happen, and we could make use of RocksDB’s transactions without dealing with the custom layer of complexity like in the case of Delta storage.
  • How RocksDB Works
    2 projects | news.ycombinator.com | 19 Apr 2023
    Tuning RocksDB well is a very very hard challenge, and one that I am happy to not do day to day anymore. RocksDB is very powerful but it comes with other very sharp edges. Compaction is one of those, and all answers are likely workload dependent.

    If you are worried about write amplification then leveled compactions are sub-optimal. I would try the universal compaction.

    - https://github.com/facebook/rocksdb/wiki/Universal-Compactio...

  • What are the advantages of using Rust to develop KV databases?
    2 projects | /r/rust | 22 Mar 2023
    It's fairly challenging to write a KV database, and takes several years of development to get the balance right between performance and reliability and avoiding data loss. Maybe read through the documentation for RocksDB https://github.com/facebook/rocksdb/wiki/RocksDB-Overview and watch the video on why it was developed and that may give you an impression of what is involved.
  • We’re the Meilisearch team! To celebrate v1.0 of our open-source search engine, Ask us Anything!
    14 projects | /r/rust | 8 Feb 2023
    LMDB is much more sain in the sense that it supports real ACID transactions instead of savepoints for RocksDB. The latter is heavy and consumes a lot more memory for a lot less read throughput. However, RocksDB has a much better parallel and concurrent write story, where you can merge entries with merge functions and therefore write from multiple CPUs.
  • Google's OSS-Fuzz expands fuzz-reward program to $30000
    3 projects | news.ycombinator.com | 2 Feb 2023
  • Event streaming in .Net with Kafka
    4 projects | dev.to | 3 Jan 2023
    Streamiz wrap a consumer, a producer, and execute the topology for each record consumed in the source topic. You can easily create stateless and stateful application. By default, each state store is a RocksDb state store persisted on disk.

What are some alternatives?

When comparing debezium and RocksDB you can also consider the following projects:

maxwell - Maxwell's daemon, a mysql-to-json kafka producer

LevelDB - LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LMDB - Read-only mirror of official repo on openldap.org. Issues and pull requests here are ignored. Use OpenLDAP ITS for issues.

SQLite - Unofficial git mirror of SQLite sources (see link for build instructions)

sled - the champagne of beta embedded databases

ClickHouse - ClickHouse® is a free analytics DBMS for big data

kafka-connect-bigquery - A Kafka Connect BigQuery sink connector

TileDB - The Universal Storage Engine

realtime - Broadcast, Presence, and Postgres Changes via WebSockets

libmdbx - One of the fastest embeddable key-value ACID database without WAL. libmdbx surpasses the legendary LMDB in terms of reliability, features and performance.

SQLite - Official Git mirror of the SQLite source tree

hudi - Upserts, Deletes And Incremental Processing on Big Data.