Our great sponsors
-
debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
This is awesome! For comparison, this is also supported on Amazon RDS, so AFAICT this opens up the possibility of near-zero-downtime streaming migrations between the two cloud providers: https://aws.amazon.com/blogs/database/using-logical-replicat...
Also, it enables a really cool pattern of change data capture, which allows you to capture "normal" changes to your Postgres database as events that can be fed to e.g. Kafka and power an event-driven/CQRS system. https://www.confluent.io/blog/bottled-water-real-time-integr... is a 2015 post describing the pattern well; the modern tool that replaces Bottled Water is https://debezium.io/ . For instance, if you have a "last_updated_by" column in your tables that's respected by all your applications, this becomes a more-or-less-free audit log, or at the very least something that you can use to spot-check that your audit logging system is capturing everything it should be!
When you're building and debugging systems that combine trusted human inputs, untrusted human inputs, results from machine learning, and results from external databases, all related to the same entity in your business logic (and who isn't doing all of these things, these days!), having this kind of replayable event capture is invaluable. If you value observability of how your distributed system evolves within the context of a single request, tracking a datum as it evolves over time is the logical (heh) evolution of that need.
Why the Parquet step? You should be able to do straight Debezium -> Kafka -> BQ, using the BQ sink connector for Kafka Connect (https://github.com/confluentinc/kafka-connect-bigquery); we have users using this with the Debezium MySQL connector, I'd expect this to work equally for Postgres.
Disclaimer: working on Debezium
Similarly - Supabase uses Elixir to listen to Postgres changes via logical replication. Pretty neat pattern and Elixir/Erlang is especially good at this sort of thing:
https://github.com/supabase/realtime