Apache Calcite
postlite
Our great sponsors
Apache Calcite | postlite | |
---|---|---|
28 | 18 | |
4,352 | 1,190 | |
1.8% | - | |
9.0 | 0.0 | |
5 days ago | 7 months ago | |
Java | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Calcite
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
> Make diff work on more than just SQLite.
Another way of doing this that I've been wanting to do for a while is to implement the DIFF operator in Apache Calcite[0]. Using Calcite, DIFF could be implemented as rewrite rules to generate the appropriate SQL to be directly executed against the database or the DIFF operator can be implemented outside of the database (which the original paper shows is more efficient).
[0] https://calcite.apache.org/
-
Apache Baremaps: online maps toolkit
Yes, planetiler rocks and the memory mapped collections enabled us to remove our dependency to rocksdb.
From my perspective, planetiler started as an effort to generate vector tiles from the OpenMapTile schema as fast as possible (pbf -> mvt). By contrast, Baremaps started as an effort to create a new schema and style from the ground up. In this regard, having a database (pbf -> db <- mvt) enables to live reload changes made in the configuration files. The database has a cost, but also comes with additional advantages (updates, dynamic data, generation of tiles at zoom levels 16+, etc.).
That being said, I think the two projects overlap and I hope we will find opportunities to collaborate in the future. For instance, whereas PostgreSQL is still required in Baremaps, I recently ported a lot of the ST_ function of Postgis to Apache Calcite with the intent to execute SQL on fast memory mapped collection.
https://github.com/apache/calcite/blob/main/core/src/main/ja...
A planet wide import in Postgis currently takes about 4 hours with the COPY API (easy to parallelize) followed by about 12 hours of simplification in Postgis (not easy to parallelize). I will try to publish a detailed benchmark in the future.
-
How to manipulate SQL string programmatically?
Use a SQL Parser like sqlglot or Apache Calcite to compile user's query into an AST.
- Can SQL be used without an RDBMS?
- Apache Calcite
- Want to contribute more to open source projects.
-
CITIC Industrial Cloud — Apache ShardingSphere Enterprise Applications
The SQL Federation engine contains processes such as SQL Parser, SQL Binder, SQL Optimizer, Data Fetcher and Operator Calculator, suitable for dealing with co-related queries and subqueries cross multiple database instances. At the underlying layer, it uses Calcite to implement RBO (Rule Based Optimizer) and CBO (Cost Based Optimizer) based on relational algebra, and query the results through the optimal execution plan.
-
Postgres wire compatible SQLite proxy
Awesome to see work in the DB wire compatible space. On the MySQL side, there was MySQL Proxy (https://github.com/mysql/mysql-proxy), which was scriptable with Lua, with which you could create your own MySQL wire compatible connections. Unfortunately it appears to have been abandoned by Oracle and IIRC doesn't work with 5.7 and beyond. I used it in the past to hack together a MySQL wire adapter for Interana (https://scuba.io/).
I guess these days the best approach for connecting arbitrary data sources to existing drivers, at least for OLAP, is Apache Calcite (https://calcite.apache.org/). Unfortunately that feels a little more involved.
-
Launch HN: Hydra (YC W22) – Query Any Database via Postgres
For anyone interested, Apache Calcite[0] is an open source data management framework which seems to do many of the same things that Hydra claims to do, but taking a different approach. Operating as a Java library, Calcite contains "adapters" to many different data sources from existing JDBC connectors to Elasticsearch to Cassandra. All of these different data sources can be joined together as desired. Calcite also has it's own optimizer which is able to push down relevant parts of the query to the different data sources. However, you get full SQL on data sources which don't support it, with Calcite executing the remaining bits itself.
Unfortunately, I would not be too surprised if Calcite was found to be less performance-optimized than Hydra. That said, there are users of Calcite at Google, Uber, Spotify, and others who have made great use of various parts of the framework.
[0] https://calcite.apache.org/
-
Anyone know of any software that can help in designing then outputting to various database
Abstraction Layer - You can use something like Calcite to abstract out your data storage. https://calcite.apache.org/
postlite
-
SQLedge: Replicate Postgres to SQLite on the Edge
#. SQLite WAL mode
From https://www.sqlite.org/isolation.html https://news.ycombinator.com/item?id=32247085 :
> [sqlite] WAL mode permits simultaneous readers and writers. It can do this because changes do not overwrite the original database file, but rather go into the separate write-ahead log file. That means that readers can continue to read the old, original, unaltered content from the original database file at the same time that the writer is appending to the write-ahead log
#. superfly/litefs: aFUSE-based file system for replicating SQLite https://github.com/superfly/litefs
#. sqldiff: https://www.sqlite.org/sqldiff.html https://news.ycombinator.com/item?id=31265005
#. dolthub/dolt: https://github.com/dolthub/dolt
> Dolt can be set up as a replica of your existing MySQL or MariaDB database using standard MySQL binlog replication. Every write becomes a Dolt commit. This is a great way to get the version control benefits of Dolt and keep an existing MySQL or MariaDB database.
#. pganalyze/libpg_query: https://github.com/pganalyze/libpg_query :
> C library for accessing the PostgreSQL parser outside of the server environment
#. Ibis + Substrait [ + DuckDB ]
> ibis strives to provide a consistent interface for interacting with a multitude of different analytical execution engines, most of which (but not all) speak some dialect of SQL.
> Today, Ibis accomplishes this with a lot of help from `sqlalchemy` and `sqlglot` to handle differences in dialect, or we interact directly with available Python bindings (for instance with the pandas, datafusion, and polars backends).
> [...] `Substrait` is a new cross-language serialization format for communicating (among other things) query plans. It's still in its early days, but there is already nascent support for Substrait in Apache Arrow, DuckDB, and Velox.
#. benbjohnson/postlite: https://github.com/benbjohnson/postlite
> postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.
> The proxy works by translating Postgres frontend wire messages into SQLite transactions and converting results back into Postgres response wire messages. Many Postgres clients also inspect the pg_catalog to determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables. The proxy also performs minor rewriting on these system queries to convert them to usable SQLite syntax.
> Note: This software is in alpha. Please report bugs. Postlite doesn't alter your database unless you issue INSERT, UPDATE, DELETE commands so it's probably safe. If anything, the Postlite process may die but it shouldn't affect your database.
#. > "Hosting SQLite Databases on GitHub Pages" (2021) re: sql.js-httpvfs, DuckDB https://news.ycombinator.com/item?id=28021766
#. awesome-db-tools https://github.com/mgramin/awesome-db-tools
-
SQLite-based databases on the postgres protocol? Yes we can!
Ben Johnson poked around in this space last year too https://github.com/benbjohnson/postlite
-
SQLite-based databases on the Postgres protocol? Yes we can
Note that this already exists on top of SQLite proper - authored by Ben Johnson (Litestream, Fly.io etc.) - https://github.com/benbjohnson/postlite
- Hctree is an experimental high-concurrency database back end for SQLite
-
WAL Mode in LiteFS
Currently, you need to SSH in and use the sqlite3 CLI on the server. There has been some work in this area but it's all still rough around the edges. I wrote a server called Postlite[1] that exposes remote SQLite databases over the Postgres wire protocol but it's very alpha. :)
I'd love to see more work in this area. Ricardo Ander-Egg wrote a remote management tool called litexplore[2] that connects over SSH to the SQLite CLI behind the scenes. I haven't used it but I think there's a lot of potential with that approach.
[1]: https://github.com/benbjohnson/postlite
[2]: https://github.com/litements/litexplore
-
Go and SQLite in the Cloud
I've not use this myself, but Ben Johnson's https://github.com/benbjohnson/postlite in front of SQlite might allow you to use PostgREST? I recall him saying on a podcast that his goal was to be able to point the large ecosystem of PG tools at SQlite.
- GitHub - benbjohnson/postlite: Postgres wire compatible SQLite proxy.
- Postgres wire兼容的SQLite代理 (Postgres wire compatible SQLite proxy)
- Postgres wire compatible SQLite proxy
- postlite: Postgres wire compatible SQLite proxy
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
sqlitebrowser - Official home of the DB Browser for SQLite (DB4S) project. Previously known as "SQLite Database Browser" and "Database Browser for SQLite". Website at:
ANTLR - ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
tuql - Automatically create a GraphQL server from a SQLite database or a SQL file
Presto - The official home of the Presto distributed SQL query engine for big data
marmot - A distributed SQLite replicator built on top of NATS
JSqlParser - JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern
sshfs - A network filesystem client to connect to SSH servers
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
awesome-graphql - Awesome list of GraphQL
Apache Drill - Apache Drill is a distributed MPP query layer for self describing data
roundabout - Postgres connection pooler