Query Engines: Push vs. Pull

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • arroyo

    Distributed stream processing engine in Rust

  • Interesting - I looked into your code a bit. I found your window aggregation library [1]. You may be interested in looking into the Rust implementation of some of the research work I've been a part of [2].

    In Flink, I believe the reason they need to implement their own backpressure system is that they multiplex TCP connections. That is, they have multiple logical streams flowing through a single TCP connection. If that's the case, you need to do some work to 1) detect which logical stream is the one that's blocking, and 2) don't block because other logical streams may be able to use the active TCP connection.

    Thinking it through, I think what Flink's approach buys is not necessarily better performance, but better just a manageable number of connections. That is, imagine you have a process P1 with operators A, B and C. And then P2 has D, E, F. Now imagine that this is a shuffle, where A, B and C are fully connected to D, E and F. In my old system, you would have 9 TCP connections. In Flink, you will have 1.

    [1] https://github.com/ArroyoSystems/arroyo/blob/master/arroyo-w...

  • ClickHouse

    ClickHouse® is a free analytics DBMS for big data

  • We initially had a pull-based query engine in ClickHouse, but then migrated to a dataflow graph query engine: https://github.com/ClickHouse/ClickHouse/blob/master/src/Pro...

    It allows decoupling of the control flow and the data flow. The movement of the data in the query pipeline is controlled explicitly.

    We did this migration a few years ago. Many database engines forked or influenced by ClickHouse still use pull-based query engines.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • sliding-window-aggregators

    Reference implementations of sliding window aggregation algorithms

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts