The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 17 query-engine Open-Source Projects
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
octosql
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
-
m3
M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
-
go-mysql-server
A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
qlever
Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
-
rumble
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)
-
opteryx
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
-
Modulo7
A semantic and technical analysis of musical scores based on Information Retrieval Principles
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19
Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait
We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441
Project mention: Wazero: Zero dependency WebAssembly runtime written in Go | news.ycombinator.com | 2023-07-01Never got it to anything close to a finished state, instead moving on to doing the same prototype in llvm and then cranelift.
That said, here's some of the wazero-based code on a branch - https://github.com/cube2222/octosql/tree/wasm-experiment/was...
It really is just a very very basic prototype.
Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09
Not super on topic because this is all immature and not integrated with one another yet, but there is a scaled-out rust data-frames-on-arrow implementation called ballista that could maybe? form the backend of a polars scale out approach: https://github.com/apache/arrow-ballista
Project mention: Show HN: Musoq, SQL like language with LLM experimental integrations | news.ycombinator.com | 2024-02-12
query-engine related posts
- OSS: Relicense to Apache 2 Globally
- What I Talk About When I Talk About Query Optimizer (Part 1): IR Design
- QLever – Fast Sparql Engine
- Distributed RDF Query Processing
- Sneller: SQL for JSON at Scale
- I created an in-memory SQL database called MemSQL as a learning project
- Implementing the MySQL server protocol for fun and profit
-
A note from our sponsor - WorkOS
workos.com | 28 Apr 2024
Index
What are some of the best open-source query-engine projects? This list will help you:
Project | Stars | |
---|---|---|
1 | doris | 11,363 |
2 | Trino | 9,552 |
3 | datafusion | 5,020 |
4 | octosql | 4,695 |
5 | m3 | 4,643 |
6 | go-mysql-server | 2,182 |
7 | Storm | 2,043 |
8 | datafusion-ballista | 1,275 |
9 | sneller | 969 |
10 | atomspace | 777 |
11 | comunica | 409 |
12 | Musoq | 284 |
13 | qlever | 275 |
14 | rumble | 207 |
15 | opteryx | 43 |
16 | HarkDB | 18 |
17 | Modulo7 | 15 |
Sponsored