Apache Calcite vs steampipe

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Apache Calcite		steampipe
	Project
28	Mentions	146
4,352	Stars	6,366
1.6%	Growth	2.2%
9.0	Activity	9.7
about 6 hours ago	Latest Commit	8 days ago
Java	Language	Go
Apache License 2.0	License	GNU Affero General Public License v3.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Apache Calcite

Posts with mentions or reviews of Apache Calcite. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-26.

Data diffs: Algorithms for explaining what changed in a dataset (2022)
8 projects | news.ycombinator.com | 26 Jul 2023

> Make diff work on more than just SQLite.
Another way of doing this that I've been wanting to do for a while is to implement the DIFF operator in Apache Calcite[0]. Using Calcite, DIFF could be implemented as rewrite rules to generate the appropriate SQL to be directly executed against the database or the DIFF operator can be implemented outside of the database (which the original paper shows is more efficient).
[0] https://calcite.apache.org/
Apache Baremaps: online maps toolkit
6 projects | news.ycombinator.com | 28 May 2023

Yes, planetiler rocks and the memory mapped collections enabled us to remove our dependency to rocksdb.
From my perspective, planetiler started as an effort to generate vector tiles from the OpenMapTile schema as fast as possible (pbf -> mvt). By contrast, Baremaps started as an effort to create a new schema and style from the ground up. In this regard, having a database (pbf -> db <- mvt) enables to live reload changes made in the configuration files. The database has a cost, but also comes with additional advantages (updates, dynamic data, generation of tiles at zoom levels 16+, etc.).
That being said, I think the two projects overlap and I hope we will find opportunities to collaborate in the future. For instance, whereas PostgreSQL is still required in Baremaps, I recently ported a lot of the ST_ function of Postgis to Apache Calcite with the intent to execute SQL on fast memory mapped collection.
https://github.com/apache/calcite/blob/main/core/src/main/ja...
A planet wide import in Postgis currently takes about 4 hours with the COPY API (easy to parallelize) followed by about 12 hours of simplification in Postgis (not easy to parallelize). I will try to publish a detailed benchmark in the future.
How to manipulate SQL string programmatically?
2 projects | /r/dataengineering | 28 Apr 2023

Use a SQL Parser like sqlglot or Apache Calcite to compile user's query into an AST.
Can SQL be used without an RDBMS?
7 projects | /r/PHP | 27 Feb 2023
Apache Calcite
1 project | news.ycombinator.com | 13 Feb 2023
Want to contribute more to open source projects.
8 projects | /r/dotnet | 18 Aug 2022
CITIC Industrial Cloud — Apache ShardingSphere Enterprise Applications
1 project | dev.to | 14 Apr 2022

The SQL Federation engine contains processes such as SQL Parser, SQL Binder, SQL Optimizer, Data Fetcher and Operator Calculator, suitable for dealing with co-related queries and subqueries cross multiple database instances. At the underlying layer, it uses Calcite to implement RBO (Rule Based Optimizer) and CBO (Cost Based Optimizer) based on relational algebra, and query the results through the optimal execution plan.
Postgres wire compatible SQLite proxy
14 projects | news.ycombinator.com | 31 Mar 2022

Awesome to see work in the DB wire compatible space. On the MySQL side, there was MySQL Proxy (https://github.com/mysql/mysql-proxy), which was scriptable with Lua, with which you could create your own MySQL wire compatible connections. Unfortunately it appears to have been abandoned by Oracle and IIRC doesn't work with 5.7 and beyond. I used it in the past to hack together a MySQL wire adapter for Interana (https://scuba.io/).
I guess these days the best approach for connecting arbitrary data sources to existing drivers, at least for OLAP, is Apache Calcite (https://calcite.apache.org/). Unfortunately that feels a little more involved.
Launch HN: Hydra (YC W22) – Query Any Database via Postgres
4 projects | news.ycombinator.com | 23 Feb 2022

For anyone interested, Apache Calcite[0] is an open source data management framework which seems to do many of the same things that Hydra claims to do, but taking a different approach. Operating as a Java library, Calcite contains "adapters" to many different data sources from existing JDBC connectors to Elasticsearch to Cassandra. All of these different data sources can be joined together as desired. Calcite also has it's own optimizer which is able to push down relevant parts of the query to the different data sources. However, you get full SQL on data sources which don't support it, with Calcite executing the remaining bits itself.
Unfortunately, I would not be too surprised if Calcite was found to be less performance-optimized than Hydra. That said, there are users of Calcite at Google, Uber, Spotify, and others who have made great use of various parts of the framework.
[0] https://calcite.apache.org/
Anyone know of any software that can help in designing then outputting to various database
1 project | /r/DatabaseHelp | 21 Nov 2021

Abstraction Layer - You can use something like Calcite to abstract out your data storage. https://calcite.apache.org/

steampipe

Posts with mentions or reviews of steampipe. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-31.

Steampipe: Dynamically query APIs, code and more with SQL
1 project | news.ycombinator.com | 4 Apr 2024
Cloud Tools You Probably Haven't Heard Of
3 projects | dev.to | 31 Mar 2024

Steampipe is a tool for querying cloud APIs and other data sources using SQL in a zero-ETL manner.
Show HN: Query Your Sheets with SheetSQL
9 projects | news.ycombinator.com | 13 Mar 2024

Readers may also enjoy Steampipe [1], an open source CLI to live query Google Sheets [2] and 140+ other services with SQL (e.g. AWS, GitHub, etc). It uses Postgres Foreign Data Wrappers under the hood and supports joins etc across the services. (Disclaimer - I'm a lead on the project.)
1 - https://github.com/turbot/steampipe
Osquery: An sqlite3 virtual table exposing operating system data to SQL
14 projects | news.ycombinator.com | 25 Feb 2024

be mindful of its AGPLv3 https://github.com/turbot/steampipe/blob/v0.21.8/LICENSE (AFAIK v0.4.3 is the last MIT release https://github.com/turbot/steampipe/blob/v0.4.3/LICENSE ) and the actual providers are Apache 2 <https://github.com/turbot/steampipe-plugin-aws/blob/v0.131.0...> (but I don't know if provider drift makes them compatible with 0.4 or not)
iasql seems to be AWS only, but good for them for taking this on:
How to run an AWS CIS v3.0 assessment in CloudShell
2 projects | dev.to | 8 Feb 2024

In a prior post I showed how to install Steampipe in AWS CloudShell to instantly query over 460+ resource types from your AWS APIs using SQL, and another post on how to use the Steampipe AWS Compliance mod to assess over 25+ security benchmarks across your AWS accounts.
Git Query Language
3 projects | news.ycombinator.com | 2 Feb 2024
Query Cloud and SaaS APIs with SQL
1 project | news.ycombinator.com | 26 Jan 2024
Cutting down AWS cost by $150k per year simply by shutting things off
8 projects | news.ycombinator.com | 22 Jan 2024

Readers may find Steampipe's [1] AWS Thrifty Mod [2] useful. It will automatically scan multiple accounts and regions for 50 cost saving opportunities - many of which are looking for over-provisioned or unused resources. For example, it's crazy how much you can save by doing things like just converting your EBS volumes to the newer gp3 type. Combine with Flowpipe [3] to automate checks and actions. It's all open source and extensible.
1 - https://github.com/turbot/steampipe
FLaNK Weekly 08 Jan 2024
41 projects | dev.to | 8 Jan 2024
Zero-ETL for Postgres: Live-query cloud APIs with 100 open source FDWs
2 projects | news.ycombinator.com | 20 Dec 2023

Steampipe [1] is an open source project [2] that includes an embedded Postgres to instantly query cloud, code & more with SQL. This release expands our plugin ecosystem [3] to be a full Zero-ETL platform. Steampipe plugins can now run natively in your own Postgres as Foreign Data Wrappers [4], as SQLite extensions [5] or as simple data export tools [6]. Please give it a try, we'd love your feedback and contributions!
1 - https://steampipe.io

What are some alternatives?

When comparing Apache Calcite and steampipe you can also consider the following projects:

Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

cloudquery - The open source high performance ELT framework powered by Apache Arrow

ANTLR - ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

cloud-custodian - Rules engine for cloud security, cost optimization, and governance, DSL in yaml for policies to query, filter, and take actions on resources

Presto - The official home of the Presto distributed SQL query engine for big data

metriql - The metrics layer for your data. Join us at https://metriql.com/slack

JSqlParser - JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern

inspec-aws - InSpec AWS Resource Pack https://www.inspec.io/

Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing

steampipe-mod-github-sherlock - Interrogate your GitHub resources with the help of the world's greatest detectives: Powerpipe + Steampipe + Sherlock.

Apache Drill - Apache Drill is a distributed MPP query layer for self describing data

embedded-postgres-binaries - Lightweight bundles of PostgreSQL binaries with reduced size intended for testing purposes.

Apache Calcite vs Trino steampipe vs cloudquery Apache Calcite vs ANTLR steampipe vs cloud-custodian Apache Calcite vs Presto steampipe vs metriql Apache Calcite vs JSqlParser steampipe vs inspec-aws Apache Calcite vs Apache Spark steampipe vs steampipe-mod-github-sherlock Apache Calcite vs Apache Drill steampipe vs embedded-postgres-binaries

Compare Apache Calcite vs steampipe and see what are their differences.

Apache Calcite

steampipe

Apache Calcite

steampipe

What are some alternatives?