PostgreSQL Logical Replication Explained

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • debezium

    Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

  • Logical replication is also great for replicating to other systems - for example Debezium [1] that writes all changes to a Kafka stream.

    I'm using it to develop a system to replicate data to in-app SQLite databases, via an in-between storage layer [2]. Logical replication is quite a low-level tool with many tricky cases, which can be difficult to handle when integrating with it directly.

    Some examples:

    1. Any value over 8KB compressed (configurable) is stored separately from the rest of the row (TOAST storage), and unchanged values included in the replicated record by default. You need to keep track of old values in the external system, or use REPLICA IDENTITY FULL (which adds a lot of overhead on the source database).

    2. PostgreSQL's primary keys can be pretty-much any combination of columns, and may or may not be used as the table's replica identity, and it may change at any time. If "REPLICA IDENTITY FULL" is used, you don't even have an explicit primary key on the receiver side - the entire record is considered the identity. Or with "REPLICA IDENTITY NOTHING", there is no identity - every operation is treated as an insert. The replica identity is global per table, so if logical replication is used to replicate to multiple systems, you may not have full control over it. This means many different combinations of replica identity needs to be handled.

    3. For initial sync you need to read the tables directly. It takes extra effort to make sure these are replicated in the same way as with incremental replication - for example taking into account the list of published tables, replica identity, row filters and column lists.

    4. Depending on what is used for high availability, replication slots may get lost in a fail-over event, meaning you'll have to re-sync all data from scratch. This includes cases where physical or logical replication is used. The only case where this is not an issue is where the underlying block storage is replicated, which is the case in AWS RDS for example.

    [1]: https://debezium.io

    [2]: https://powersync.co

  • pg-logical-replication

    PostgreSQL Logical Replication client for node.js

  • > Logical replication is also great for replicating to other systems - for example Debezium [1] that writes all changes to a Kafka stream.

    So I'm looking for a Rust library (or a C library) that receives a Postgres logical replication stream and is compatible with Postgres 15, does you know of something?

    Debezium is unfortunately written in Java, and I think your Powersync is proprietary right? (I didn't find a Github link). I can find stuff like https://www.npmjs.com/package/pg-logical-replication but again, it's Javascript.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • logicaldecoding

  • For C there should be good options.

    For Rust it doesn't appear that well-supported.

    A very simple approach is to poll for changes using `pg_logical_slot_get_changes()` - that should work with any driver. That's what I used for my initial experimentation, before switching over to the streaming replication protocol for better performance.

    The streaming replication protocol is not that complicated, but currently you'll have to handle some of the low-level protocol yourself, or work with some very experimental implementations. There's a project to help get you started at [1], and some more discussion at [2].

    For the logical decoder, wal2json is quite nice to experiment with, but I've found pgoutput is not that complicated and gives you something closer to the raw data.

    [1]: https://github.com/seddonm1/logicaldecoding/

    [2]: https://github.com/sfackler/rust-postgres/issues/116

  • rust-postgres

    Native PostgreSQL driver for the Rust programming language

  • For C there should be good options.

    For Rust it doesn't appear that well-supported.

    A very simple approach is to poll for changes using `pg_logical_slot_get_changes()` - that should work with any driver. That's what I used for my initial experimentation, before switching over to the streaming replication protocol for better performance.

    The streaming replication protocol is not that complicated, but currently you'll have to handle some of the low-level protocol yourself, or work with some very experimental implementations. There's a project to help get you started at [1], and some more discussion at [2].

    For the logical decoder, wal2json is quite nice to experiment with, but I've found pgoutput is not that complicated and gives you something closer to the raw data.

    [1]: https://github.com/seddonm1/logicaldecoding/

    [2]: https://github.com/sfackler/rust-postgres/issues/116

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts