How to handle partial updates and bulk updates in the source systems

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • materialize

    The data warehouse for operational workloads. (by MaterializeInc)

  • Imo this is a matter of schema design. You shouldn't have to send the entire object in the event, just the delta. If you are using an event based schema, ideally you should be able to generate the current state by iterating over all the events and combining the deltas. An OLAP database/warehouse/lakehouse can be very efficient at this depending on how the data is partitioned. You could consider a Materialized Views, Clickhouse Live Views, Delta Live Tables or even a tool like Materialize in order to create views that represent the current state. Designing a schema that makes it easy to construct queries to generate state will make your life easier. Check out something like Activity Schema for inspiration.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts