What I Talk About When I Talk About Query Optimizer (Part 1): IR Design

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

datafusion

55 5,020 9.9 Rust

Apache DataFusion SQL Query Engine

Agree, substrait is a really cool project! Related: if you like substrait you might want to check out datafusion too. The project is a query execution engine built on top of Apache Arrow (with SQL parser, query planner & optimizer, execution engine, extensible user defined functions, among others) and it implements a substrait provider and consumer: https://github.com/apache/arrow-datafusion/tree/main/datafus...

materialize

117 5,567 10.0 Rust

The data warehouse for operational workloads. (by MaterializeInc)

> the Query Graph Model (QGM) representation is quite abstract and hardcodes many properties, making it exceptionally difficult to understand. Its claimed extensibility is also questionable.
I don't know much about the context, but it was interesting to note that Materialize scrapped their QGM code last year: https://github.com/MaterializeInc/materialize/pull/17139
Also, a couple of interesting projects in the IR space:
- https://substrait.io/ is a cross-language serialization for Relational Algebra
- https://www.lingo-db.com/ is an MLIR-based query engine described extensively in this paper https://db.in.tum.de/~jungmair/papers/p2485-jungmair.pdf?lan...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
go-mysql-server

23 2,182 9.9 Go

A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

We implemented a query optimizer with a flexible intermediate representation in pure Go:
https://github.com/dolthub/go-mysql-server
Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.
This is also the query engine that Dolt uses:
https://github.com/dolthub/dolt
But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.

dolt

93 16,971 10.0 Go

Dolt – Git for Data

We implemented a query optimizer with a flexible intermediate representation in pure Go:
https://github.com/dolthub/go-mysql-server
Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.
This is also the query engine that Dolt uses:
https://github.com/dolthub/dolt
But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project