kafka-connect-transform-xml VS debezium

Compare kafka-connect-transform-xml vs debezium and see what are their differences.


Transformation for converting XML data to Structured data. (by jcustenborder)


Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ. (by debezium)
Our great sponsors
  • Mergify - Updating dependencies is time-consuming.
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • Sonar - Write Clean Java Code. Always.
kafka-connect-transform-xml debezium
1 76
22 9,084
- 3.3%
0.0 9.7
about 1 year ago 6 days ago
Java Java
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.


Posts with mentions or reviews of kafka-connect-transform-xml. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2020-10-05.


Posts with mentions or reviews of debezium. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-22.
  • All the ways to capture changes in Postgres
    12 projects | news.ycombinator.com | 22 Sep 2023
  • Real-time Data Processing Pipeline With MongoDB, Kafka, Debezium And RisingWave
    3 projects | dev.to | 18 Jul 2023
  • How to Listen to Database Changes Using Postgres Triggers in Elixir
    10 projects | news.ycombinator.com | 14 Jun 2023
  • What are your favorite tools or components in the Kafka ecosystem?
    10 projects | /r/apachekafka | 31 May 2023
    Debezium: https://debezium.io/ (connector for cdc)
  • [Need feedback] I wrote a guide about the fundamentals of BigQuery for software developers & traditional database users
    4 projects | /r/dataengineering | 14 Apr 2023
    You don't want to couple your analytics database with your app. The only time this makes sense is when you're building small projects. When you have very high traffic, this method will break. Just stick to CDC. Look into tools like debezium if your team is concerned with sending raw data to the cloud.
  • How Change Data Capture (CDC) Works with Streaming Database
    5 projects | dev.to | 7 Apr 2023
    If you’re already using Debezium to extract CDC logs into Kafka, you can just set up RisingWave to consume changes from that Kafka topic. In this case, Kafka acts like a hub of CDC data, and beside RisingWave, other downstream systems like search index or data warehouses can consume changes as well.
  • PostgreSQL Logical Replication Explained
    4 projects | news.ycombinator.com | 18 Mar 2023
    Logical replication is also great for replicating to other systems - for example Debezium [1] that writes all changes to a Kafka stream.

    I'm using it to develop a system to replicate data to in-app SQLite databases, via an in-between storage layer [2]. Logical replication is quite a low-level tool with many tricky cases, which can be difficult to handle when integrating with it directly.

    Some examples:

    1. Any value over 8KB compressed (configurable) is stored separately from the rest of the row (TOAST storage), and unchanged values included in the replicated record by default. You need to keep track of old values in the external system, or use REPLICA IDENTITY FULL (which adds a lot of overhead on the source database).

    2. PostgreSQL's primary keys can be pretty-much any combination of columns, and may or may not be used as the table's replica identity, and it may change at any time. If "REPLICA IDENTITY FULL" is used, you don't even have an explicit primary key on the receiver side - the entire record is considered the identity. Or with "REPLICA IDENTITY NOTHING", there is no identity - every operation is treated as an insert. The replica identity is global per table, so if logical replication is used to replicate to multiple systems, you may not have full control over it. This means many different combinations of replica identity needs to be handled.

    3. For initial sync you need to read the tables directly. It takes extra effort to make sure these are replicated in the same way as with incremental replication - for example taking into account the list of published tables, replica identity, row filters and column lists.

    4. Depending on what is used for high availability, replication slots may get lost in a fail-over event, meaning you'll have to re-sync all data from scratch. This includes cases where physical or logical replication is used. The only case where this is not an issue is where the underlying block storage is replicated, which is the case in AWS RDS for example.

    [1]: https://debezium.io

    [2]: https://powersync.co

  • Spring, SchemaSpy DB docs, and GitHub Pages
    8 projects | dev.to | 12 Mar 2023
    Data engineers have to be aware of tables structure to deal with Change Data Capture events correctly.
  • CDC Implementation
    2 projects | /r/dataengineering | 12 Feb 2023
    We use debezium at work to push data from operational databases to data warehouse.
  • Stream MySQL changes
    3 projects | /r/golang | 3 Feb 2023
    If capturing the changes that occur on a database, and writing those changes elsewhere is what you want, then take a look at https://debezium.io/

What are some alternatives?

When comparing kafka-connect-transform-xml and debezium you can also consider the following projects:

maxwell - Maxwell's daemon, a mysql-to-json kafka producer

kafka-connect-bigquery - A Kafka Connect BigQuery sink connector

realtime - Broadcast, Presence, and Postgres Changes via WebSockets

hudi - Upserts, Deletes And Incremental Processing on Big Data.

RocksDB - A library that provides an embeddable, persistent key-value store for fast storage.

iceberg - Apache Iceberg

PostgreSQL - Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Embulk - Embulk: Pluggable Bulk Data Loader.

wal2json - JSON output plugin for changeset extraction

pipelinewise - Data Pipeline Framework using the singer.io spec

SpinalTap - Change Data Capture (CDC) service