What I Talk About When I Talk About Query Optimizer (Part 1): IR Design

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • arrow-datafusion

    Apache Arrow DataFusion SQL Query Engine

    Agree, substrait is a really cool project! Related: if you like substrait you might want to check out datafusion too. The project is a query execution engine built on top of Apache Arrow (with SQL parser, query planner & optimizer, execution engine, extensible user defined functions, among others) and it implements a substrait provider and consumer: https://github.com/apache/arrow-datafusion/tree/main/datafus...

  • materialize

    The data warehouse for operational workloads. (by MaterializeInc)

    > the Query Graph Model (QGM) representation is quite abstract and hardcodes many properties, making it exceptionally difficult to understand. Its claimed extensibility is also questionable.

    I don't know much about the context, but it was interesting to note that Materialize scrapped their QGM code last year: https://github.com/MaterializeInc/materialize/pull/17139

    Also, a couple of interesting projects in the IR space:

    - https://substrait.io/ is a cross-language serialization for Relational Algebra

    - https://www.lingo-db.com/ is an MLIR-based query engine described extensively in this paper https://db.in.tum.de/~jungmair/papers/p2485-jungmair.pdf?lan...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • go-mysql-server

    A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

    We implemented a query optimizer with a flexible intermediate representation in pure Go:

    https://github.com/dolthub/go-mysql-server

    Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.

    This is also the query engine that Dolt uses:

    https://github.com/dolthub/dolt

    But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.

  • dolt

    Dolt – Git for Data

    We implemented a query optimizer with a flexible intermediate representation in pure Go:

    https://github.com/dolthub/go-mysql-server

    Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.

    This is also the query engine that Dolt uses:

    https://github.com/dolthub/dolt

    But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts