Our great sponsors
-
debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
temporal_tables
Postgresql temporal_tables extension in PL/pgSQL, without the need for external c extension. (by nearform)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
pg-event-proxy-example
Discontinued Send NOTIFY and WAL events from PostgreSQL to upstream services (amqp / redis / mqtt)
I concur for anything that's not auditing. I've implemented a similar project using MySQL and Maxwell's Daemon[0]. But it's a hack—really, events should be generated by the services and applications and use a dedicated event bus.
That being said, in this "stream changes from the DB" space there's also Debezium[1]. I haven't worked with it but it looks pretty slick.
[0] https://maxwells-daemon.io/
If you're using Elixir, check out https://github.com/cpursley/walex
I've actually been thinking about turning this idea into a product where you can just point it at your postgres database and select the tables you want to listen to (with filters, like you describe). And have that forwarded to a webhook (with possibility of other protocols).
I'd love to hear folks thoughts on that (and if it would be something people would pay for).
I enjoyed this blog. I think it provides a great succinct overview of various approaches native to Postgres.
For the "capture changes in an audit table" section, I've had good experiences at a previous company with the Temporal Tables pattern. Unlike other major RDBMS vendors, it's not built into Postgres itself, but there's a simple pattern [1] you can leverage with a SQL function.
This allows you to see a table's state as of a specific point in time. Some sample use cases:
- "What was this user's configuration on Aug 12?"
- "How many records were unprocessed at 11:55pm last night?"
- "Show me the diff on feature flags between now and a week ago"
[1]: https://github.com/nearform/temporal_tables
There is also the temporal_tables extension.
[0] https://github.com/arkhipov/temporal_tables
Using triggers + history tables (aka audit tables) is the right answer 98% of the time. Just do it. If you're not already doing it, start today. It is a proven technique, in use for _over 30 years_.
Here's a quick rundown of how to do it generically https://gist.github.com/slotrans/353952c4f383596e6fe8777db5d... (trades off space efficiency for "being easy").
It's great if you can store immutable data. Really, really great. But you _probably_ have a ton of mutable data in your database and you are _probably_ forgetting a ton of it every day. Stop forgetting things! Use history tables.
cf. https://github.com/matthiasn/talk-transcripts/blob/master/Hi...
Do not use Papertrail or similar application-space history tracking libraries/techniques. They are slow, error-prone, and incapable of capturing any DB changes that bypass your app stack (which you probably have, and should). Worth remembering that _any_ attempt to capture an "updated" timestamp from your app is fundamentally incorrect, because each of your webheads has its own clock. Use the database clock! It's the only one that's correct!
Yo :D This is what Supabase Realtime does!
https://github.com/supabase/realtime
Spin up a Supabase database and then subscribe to changes with WebSockets.
You can play with it here once you have a db: https://realtime.supabase.com/inspector/new
It exists [0] but does not seem to be that interesting to users
[0] https://github.com/subzerocloud/pg-event-proxy-example
No. We implemented our own [1] for a few reasons:
* Scaling well to multi-TB DBs without pinning the write-ahead log (potentially filling your DB's disk) while the backfill is happening. Instead, our connector constantly reads the WAL and works well in setups like Supabase that have very restrictive WAL sizes (1GB iirc).
* Incremental fault-tolerant backfills that can be stopped and resumed at will.
* Being able to offer "precise" captures which are logically consistent in terms of the sequence of create/update/delete events.
The last one becomes really interesting when paired with REPLICA IDENTITY FULL, because you can feed it into an incremental computation (perhaps differential dataflow) for streaming updates of a continuous computation.
Our work is based off of the Netflix DBLog paper, which we took and ran with.
[1] https://github.com/estuary/connectors/tree/main/source-postg...