zetasql
libpg_query
Our great sponsors
zetasql | libpg_query | |
---|---|---|
15 | 13 | |
2,122 | 1,050 | |
2.2% | 2.4% | |
0.0 | 8.9 | |
25 days ago | about 1 month ago | |
C++ | C | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
zetasql
-
Mangle, a programming language for deductive database programming
There are even table-valued functions.
These things are not widespread, and differ by implementation, and the way these are used by clients are copy-and-paste. Something as thoughtful as ZetaSQL https://github.com/google/zetasql does not have mechanisms for structuring (modules, packages, interfaces). SQL will not, cannot evolve into such a direction (or, anything that evolves, will not be recognizable as SQL).
-
goccy/bigquery-emulator: BigQuery emulator server implemented in Go
Hi, I develop a BigQuery emulator ( https://github.com/goccy/bigquery-emulator ) from early 2022. It is written in Go, but can be used from bq command line tool and other language's (e.g. Python ) client SDK after installing docker image or released binary. It currently supports over 200 of the nearly 330 standard functions in BigQuery and all data types except GEOGRAPHY ( see https://github.com/goccy/go-zetasqlite#status for details ). ZetaSQL ( https://github.com/google/zetasql ) is used to parse and analyze queries.
-
Parsing SQL
If you don't want to do it yourself, there's this:
https://github.com/google/zetasql
Parsing is huge but it's amazing how small a part of the job it is. This library isn't even the half of it.
- SQLGlot: SQL parser, transpiler, optimizer – translate to Presto, Spark, Hive
-
New PostgreSQL Interface for Cloud Spanner
https://github.com/google/zetasql
It is amazingly good.
You give it textual SQL (+ schema + all your function definitions) and it returns a really clean logical query plan. It is also happy to do this via a protobufs so you can use it from languages other than C++. It is also tested and documented up the wazoo. It has been such a pleasure to work with.
Anyway, the big problem with ZetaSQL is that it is not a common SQL dialect.
It seems that the only reasonable way to do this PostgreSQL interface for Cloud Spanner is to add a second parser (and other extensions) to ZetaSQL. If I am correct, I really really hope they open source that part of ZetaSQL as well - it would be a massive step forward for open source SQL tooling.
I mean the postgres parser (and semantic changes) for ZetaSQL. The zetasql parser is in a file called zetasql/parser/bison_parser.y, I strongly suspect they now have a file called something like zetasql/pgparser/bison_parser.y as well (and much more pervasive changes to support the deeper differences in the dialects).
This is the lexical structure and syntax docs for the new postgres inteface to cloud spanner:
https://cloud.google.com/spanner/docs/postgresql/lexical
And this is the zetasql lexical structure and syntax docs:
https://github.com/google/zetasql/blob/master/docs/lexical.m...
Notice that the new PG docs are an edit of the Zeta ones - evidence that my hypothesis is correct.
-
Open Source SQL Parsers
zetasql implements BigQuery, Spanner, and Dataflow dialects.
-
Let's write a compiler, part 5: A code generator
ZetaSQL[1] seems like it could be a fit for your use case. I've worked with Apache Calcite in the past and found it to be very complex to work with. I found ZetaSQL to be a little easier to use.
-
BigQuery Language Server
I’m not aware of one but you could probably use ZetaSQL to put one together, the difficult work has been opened, you’d just need to add the LSP layer.
-
A new template-defined width integer C++ library has snuck its way into Google ZetaSQL
Some additional helper functions here.
libpg_query
-
Transpile Any SQL to PostgreSQL Dialect
This in combination with [pg_query](https://github.com/pganalyze/libpg_query) could be a very powerful combination that allows writing generic static analyzers.
-
Postgres: The Next Generation
It's true that the core PG code isn't written in a modular way that's friendly to integration piecemeal in other projects (outside of libpq).
For THIS PARTICULAR case, the pganalyze team has actually extracted out the parser of PG for including in your own projects:
-
SQLedge: Replicate Postgres to SQLite on the Edge
#. SQLite WAL mode
From https://www.sqlite.org/isolation.html https://news.ycombinator.com/item?id=32247085 :
> [sqlite] WAL mode permits simultaneous readers and writers. It can do this because changes do not overwrite the original database file, but rather go into the separate write-ahead log file. That means that readers can continue to read the old, original, unaltered content from the original database file at the same time that the writer is appending to the write-ahead log
#. superfly/litefs: aFUSE-based file system for replicating SQLite https://github.com/superfly/litefs
#. sqldiff: https://www.sqlite.org/sqldiff.html https://news.ycombinator.com/item?id=31265005
#. dolthub/dolt: https://github.com/dolthub/dolt
> Dolt can be set up as a replica of your existing MySQL or MariaDB database using standard MySQL binlog replication. Every write becomes a Dolt commit. This is a great way to get the version control benefits of Dolt and keep an existing MySQL or MariaDB database.
#. pganalyze/libpg_query: https://github.com/pganalyze/libpg_query :
> C library for accessing the PostgreSQL parser outside of the server environment
#. Ibis + Substrait [ + DuckDB ]
> ibis strives to provide a consistent interface for interacting with a multitude of different analytical execution engines, most of which (but not all) speak some dialect of SQL.
> Today, Ibis accomplishes this with a lot of help from `sqlalchemy` and `sqlglot` to handle differences in dialect, or we interact directly with available Python bindings (for instance with the pandas, datafusion, and polars backends).
> [...] `Substrait` is a new cross-language serialization format for communicating (among other things) query plans. It's still in its early days, but there is already nascent support for Substrait in Apache Arrow, DuckDB, and Velox.
#. benbjohnson/postlite: https://github.com/benbjohnson/postlite
> postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.
> The proxy works by translating Postgres frontend wire messages into SQLite transactions and converting results back into Postgres response wire messages. Many Postgres clients also inspect the pg_catalog to determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables. The proxy also performs minor rewriting on these system queries to convert them to usable SQLite syntax.
> Note: This software is in alpha. Please report bugs. Postlite doesn't alter your database unless you issue INSERT, UPDATE, DELETE commands so it's probably safe. If anything, the Postlite process may die but it shouldn't affect your database.
#. > "Hosting SQLite Databases on GitHub Pages" (2021) re: sql.js-httpvfs, DuckDB https://news.ycombinator.com/item?id=28021766
#. awesome-db-tools https://github.com/mgramin/awesome-db-tools
-
Show HN: Postgres Language Server
Can't you just give some love to the issue https://github.com/pganalyze/libpg_query/issues/44 instead ? As I said before this would be very helpful for the community because there are a lot of libraries that use libpg_query and cannot be used on windows (f.e see https://github.com/lelit/pglast/issues/7).
It seems that the only problem for fixing the problem is:
> Thanks for the offer, but the problem is our team being time limited / having an engineer with a Windows machine ready to take this on, not that we wouldn't want to pay someone to work on it :)
(https://github.com/pganalyze/libpg_query/issues/44#issuecomm...)
Hosting the LSP elsewhere is really needed since if people wanted to go that way they could use Remote ssh (https://code.visualstudio.com/docs/remote/ssh) to host the whole dev environment on linux and connect to it.
Thank you
Excited to see this - and excellent use case for libpg_query (I'm the original author and still help maintain it together with the rest of the team) and appreciate the shout out to pganalyze!
If anyone else has a use case for using the Postgres parser outside the server, we have a healthy ecosystem of libraries that build on the core C library (we maintain bindings for Ruby, Go and Rust ourselves), as well as various projects using it (e.g. sqlc uses it for a type-safe way for using hand-written SQL in Go): https://github.com/pganalyze/libpg_query#resources
Generally I agree that this would be great to have, and Postgres does have a set of libraries it already maintains as part of the main source tree (i.e. libpq, etc), and there is a shared set of code between the backend and the "frontend" (https://github.com/postgres/postgres/tree/master/src/common). So theoretically you could imagine the parser moving into that shared code portion, sharing code but not necessarily requiring linking to a library from the backend.
However, the challenge from what I've understood from past conversations with some folks working on Postgres core is that the parser is currently heavily tied into the backend - note the parser isn't just the scan.l/gram.y file, but also the raw parse node structs that it outputs. You can see how many files we pull in from the main tree that are prefixed with "src_backend": https://github.com/pganalyze/libpg_query/tree/15-latest/src/...
Further, there isn't a canonical way to output node trees into a text format today in core, besides the rather hard to work with output of debug_print_parse - there have been discussions on -hackers to potentially utilize JSON here, which may make this a bit easier. Note that in libpg_query we currently use Protobuf (but used to use JSON), which does have the benefit of getting auto-generated structs in the language bindings - but Protobuf is not used in core Postgres at all today.
All in all, I think there is some upstream interest, but its not clear that this is a good idea from a maintainability perspective.
it leverages your code-base, rather than connecting to your database. it uses to [libpg_query](https://github.com/pganalyze/libpg_query) to construct the syntax tree which can then be used to for the LSP features
The libpg query library has a very important problem: It does not work on windows https://github.com/pganalyze/libpg_query/issues/44
I'd recommend starting with fixing that instead. It would be much more helpful for the community.
-
Show HN: PRQL – A Proposal for a Better SQL
I like that everyone is trying to make something like SQL that reads more naturally to them. More alternatives is good! SQL is a widely accepted standard, and has strictly defined and super broadly accepted semantics.
As someone who has written quite a few half-baked-for-general-use but fit-for-purpose SQL generator utilities over the years, I'll suggest that if you intend for a novel syntax to be a general SQL replacement then being isomorphic to SQL would massively increase usefulness and uptake:
1. novel syntax to SQL; check! Now novel syntax works with all the databases!
2. any valid SQL to novel syntax; a bit harder, but I'd start by using a SQL parser like https://github.com/pganalyze/libpg_query and translating the resulting AST into the novel syntax.
3. novel syntax to SQL back to novel syntax is idempotent; a nice side effect is a validator/formatter for "novel syntax"
4. SQL to novel syntax back to SQL is idempotent; a nice side effect is a validator/formatter for SQL, which would be awesome. (See also https://go.dev/blog/gofmt, which is where I learned this "round trip as formatter" trick.)
I don't mean for this to sound negative, and I know that 2, 3, and 4 are kind of hard. Thank you for building prql!
-
Go PL/SQL parser using ANTLRv4
I feel like https://github.com/pganalyze/libpg_query should be the default choice for anything that needs a SQL parser. PL/SQL parsing is included there.
What are some alternatives?
sqlparse - A non-validating SQL parser module for Python
Apache Calcite - Apache Calcite
JSqlParser - JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern
ANTLR - ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
pgsql-parser - PostgreSQL Query Parser for Node.js
sqlite-parser - JavaScript implentation of SQLite 3 query parser
sqlglot - Python SQL Parser and Transpiler
alasql - AlaSQL.js - JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and nested JSON data (NoSQL). Export, store, and import data from localStorage, IndexedDB, or Excel.
prql - PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
Presto - The official home of the Presto distributed SQL query engine for big data