zetasql
duckdb
zetasql | duckdb | |
---|---|---|
15 | 52 | |
2,141 | 16,902 | |
0.9% | 5.3% | |
0.0 | 10.0 | |
2 months ago | 5 days ago | |
C++ | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
zetasql
-
Mangle, a programming language for deductive database programming
There are even table-valued functions.
These things are not widespread, and differ by implementation, and the way these are used by clients are copy-and-paste. Something as thoughtful as ZetaSQL https://github.com/google/zetasql does not have mechanisms for structuring (modules, packages, interfaces). SQL will not, cannot evolve into such a direction (or, anything that evolves, will not be recognizable as SQL).
-
goccy/bigquery-emulator: BigQuery emulator server implemented in Go
Hi, I develop a BigQuery emulator ( https://github.com/goccy/bigquery-emulator ) from early 2022. It is written in Go, but can be used from bq command line tool and other language's (e.g. Python ) client SDK after installing docker image or released binary. It currently supports over 200 of the nearly 330 standard functions in BigQuery and all data types except GEOGRAPHY ( see https://github.com/goccy/go-zetasqlite#status for details ). ZetaSQL ( https://github.com/google/zetasql ) is used to parse and analyze queries.
- ZetaSQL – Analyzer Framework for SQL
-
ZetaSQL - Question about using local service
We are using a Python client binding for ZetaSQL GRPC local service in our application to analyze statements and extract referenced tables and output columns.
-
Parsing SQL
If you don't want to do it yourself, there's this:
https://github.com/google/zetasql
Parsing is huge but it's amazing how small a part of the job it is. This library isn't even the half of it.
- SQLGlot: SQL parser, transpiler, optimizer – translate to Presto, Spark, Hive
- ZetaSQL - Analyzer Framework for SQL
- ZetaSQL
-
New PostgreSQL Interface for Cloud Spanner
I mean the postgres parser (and semantic changes) for ZetaSQL. The zetasql parser is in a file called zetasql/parser/bison_parser.y, I strongly suspect they now have a file called something like zetasql/pgparser/bison_parser.y as well (and much more pervasive changes to support the deeper differences in the dialects).
This is the lexical structure and syntax docs for the new postgres inteface to cloud spanner:
https://cloud.google.com/spanner/docs/postgresql/lexical
And this is the zetasql lexical structure and syntax docs:
https://github.com/google/zetasql/blob/master/docs/lexical.m...
Notice that the new PG docs are an edit of the Zeta ones - evidence that my hypothesis is correct.
-
Open Source SQL Parsers
zetasql implements BigQuery, Spanner, and Dataflow dialects.
duckdb
- 🪄 DuckDB sql hack : get things SORTED w/ constraint CHECK
- DuckDB: Move to push-based execution model (2021)
-
DuckDB performance improvements with the latest release
I'm not sure if the fix is reassuring or not: https://github.com/duckdb/duckdb/pull/9411/files
-
Building a Distributed Data Warehouse Without Data Lakes
It's an interesting question!
The problem is that the data is spread everywhere - no choice about that. So with that in mind, how do you query that data? Today, the idea is that you HAVE to put it into a central location. With tools like Bacalhau[1] and DuckDB [2], you no longer have to - a single query can be sharded amongst all your data - EFFECTIVELY giving you a lot of what you want from a data lake.
It's not a replacement, but if you can do a few of these items WITHOUT moving the data, you will be able to see really significant cost and time savings.
[1] https://github.com/bacalhau-project/bacalhau
[2] https://github.com/duckdb/duckdb
- DuckDB 0.9.0
-
Push or Pull, is this a question?
[4] Switch to Push-Based Execution Model by Mytherin · Pull Request #2393 · duckdb/duckdb (github.com)
-
Show HN: Hydra 1.0 – open-source column-oriented Postgres
it depends on your query obviously.
In general, I did very deep benchmarking of pg, clickhouse and duckdb, and I sure didn't make stupid mistakes like this: https://news.ycombinator.com/item?id=36990831
My dataset has 50B rows and 2tb of data, and I think columnar dbs are very overhiped and I chose pg because:
- pg performance is acceptable, maybe 2-3x times slower than clickhouse and duckdb on some queries if pg is configured correctly and run on compressed storage
- clickhouse and duckdb start falling apart very fast because they specialized on very narrow type of queries: https://github.com/ClickHouse/ClickHouse/issues/47520 https://github.com/ClickHouse/ClickHouse/issues/47521 https://github.com/duckdb/duckdb/discussions/6696
-
🦆 Effortless Data Quality w/duckdb on GitHub ♾️
This action installs duckdb with the version provided in input.
-
Using SQL inside Python pipelines with Duckdb, Glaredb (and others?)
Duckdb: https://github.com/duckdb/duckdb - seems pretty popular, been keeping an eye on this for close to a year now.
-
CSV or Parquet File Format
The Parquet-Go library is very complex, not yet success to use it. So I ask whether DuckDB can provide API https://github.com/duckdb/duckdb/issues/7776
What are some alternatives?
sqlparse - A non-validating SQL parser module for Python
ClickHouse - ClickHouse® is a free analytics DBMS for big data
Apache Calcite - Apache Calcite
sqlite-worker - A simple, and persistent, SQLite database for Web and Workers.
JSqlParser - JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern
datasette - An open source multi-tool for exploring and publishing data
ANTLR - ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
pgsql-parser - PostgreSQL Query Parser for Node.js
metabase-clickhouse-driver - ClickHouse database driver for the Metabase business intelligence front-end
sqlite-parser - JavaScript implentation of SQLite 3 query parser
datafusion - Apache DataFusion SQL Query Engine