Query Engines: Push vs. Pull

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

duckdb

52 16,576 10.0 C++

DuckDB is an in-process SQL OLAP Database Management System

The DuckDB folks are migrating from pull to push and put together this interesting documentation of their reasoning! They use a vectorized model instead of a compiled one, so it's another interesting comparison point. It seems like push will simplify how they handle parallelism.
https://github.com/duckdb/duckdb/issues/1583

octosql

34 4,689 4.3 Go

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
rust

2,681 92,831 10.0 Rust

Empowering everyone to build reliable and efficient software.

I like the use of generator functions for the pull-based example- the syntax closely mirrors the push-based example, which drives home the point that the difference between push and pull lies entirely in whether you compose operators via an iterator API or a callback API. You can also see this in how the two usage examples are inside-out versions of each other.
When you look at it from the perspective of a language designer or compiler, things start to look even more similar:
* Values that live across iterations are stored in a closure (push) or generator object (pull)
* In the first half of an iteration, before producing a result, operators dispatch to their producers via a return address on the call stack (push) or callbacks (pull)
* In the second half of an iteration, while building up a result, operators dispatch to their consumers via callbacks (push) or a return address on the call stack (pull)
The difference the article mentions between "unrolling" the two approaches comes from the level of abstraction where inlining is performed. The push model produces natural-looking output when inlining the closures. The less-natural result of inlining the pull model's `next` functions is where the control flow in loops shows up. (See for example the performance benefits of adding a push-model complement to Rust's usual pull-model Iterator trait: https://github.com/rust-lang/rust/pull/42782#issuecomment-30...)
But, you can get the same natural-looking output from the pull model if you perform inlining at the level of generators! This is a less familiar transformation than function inlining, but it's still fundamentally the same idea, of erasing API boundaries and ABI overhead- which in this case is a more straightforward formulation of the various loop optimizations that can sometimes get rid of the control-flow-in-loops produced by inlining `next` functions.
The difference around DAGs is similar. In the push model, a DAG just means a producer calling into more than one consumer callback. In the pull model, this is tricky because you can't just yield (i.e. return from `next`) to more than one place- return addresses form a stack! (And thus the dual problem for the push model shows up for operations with multiple inputs, like the merge join mentioned in the article.)
Overall I think a more flexible approach might be to generalize from `yield` and iterator APIs, to algebraic effects (or, lower level, delimited continuations)- these let you build both push and pull-like APIs for a single generator-like function, as well as hybrid approaches that combine aspects of both, by removing the requirement for a single privileged call stack.

proposal-observable

12 3,036 0.0 JavaScript

Observables for ECMAScript
Rx.NET

61 6,474 6.6 C#

The Reactive Extensions for .NET
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ask HN: Which books/resources to understand modern Assembler?
6 projects | news.ycombinator.com | 21 Apr 2024
Managing mutable data in Elixir with Rust
1 project | news.ycombinator.com | 16 Feb 2024
Free MIT Course: Performance Engineering of Software Systems
4 projects | news.ycombinator.com | 10 Jan 2024
Verifying Rust Zeroize with Assembly...including portable SIMD
1 project | dev.to | 10 Jan 2024
Operator precedence doubt
1 project | /r/cprogramming | 11 Dec 2023

Query Engines: Push vs. Pull

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Reactive Programming Rust Go Compiler
Post date: 1 May 2021

duckdb

octosql

WorkOS

rust

proposal-observable

Rx.NET

InfluxDB

Related posts

Query Engines: Push vs. Pull

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com SQL Reactive Programming Rust Go Compiler Post date: 1 May 2021

duckdb

octosql

WorkOS

rust

proposal-observable

Rx.NET

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Reactive Programming Rust Go Compiler
Post date: 1 May 2021