-
Agree, substrait is a really cool project! Related: if you like substrait you might want to check out datafusion too. The project is a query execution engine built on top of Apache Arrow (with SQL parser, query planner & optimizer, execution engine, extensible user defined functions, among others) and it implements a substrait provider and consumer: https://github.com/apache/arrow-datafusion/tree/main/datafus...
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
materialize
Real-time Data Integration and Transformation: use SQL to transform, deliver, and act on fast-changing data. (by MaterializeInc)
> the Query Graph Model (QGM) representation is quite abstract and hardcodes many properties, making it exceptionally difficult to understand. Its claimed extensibility is also questionable.
I don't know much about the context, but it was interesting to note that Materialize scrapped their QGM code last year: https://github.com/MaterializeInc/materialize/pull/17139
Also, a couple of interesting projects in the IR space:
- https://substrait.io/ is a cross-language serialization for Relational Algebra
- https://www.lingo-db.com/ is an MLIR-based query engine described extensively in this paper https://db.in.tum.de/~jungmair/papers/p2485-jungmair.pdf?lan...
-
go-mysql-server
A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.
We implemented a query optimizer with a flexible intermediate representation in pure Go:
https://github.com/dolthub/go-mysql-server
Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.
This is also the query engine that Dolt uses:
https://github.com/dolthub/dolt
But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.
-
We implemented a query optimizer with a flexible intermediate representation in pure Go:
https://github.com/dolthub/go-mysql-server
Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.
This is also the query engine that Dolt uses:
https://github.com/dolthub/dolt
But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.