octosql
q
Our great sponsors
octosql | q | |
---|---|---|
34 | 46 | |
4,689 | 10,109 | |
- | - | |
4.3 | 3.6 | |
7 months ago | 3 months ago | |
Go | Python | |
Mozilla Public License 2.0 | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
octosql
-
Wazero: Zero dependency WebAssembly runtime written in Go
Never got it to anything close to a finished state, instead moving on to doing the same prototype in llvm and then cranelift.
That said, here's some of the wazero-based code on a branch - https://github.com/cube2222/octosql/tree/wasm-experiment/was...
It really is just a very very basic prototype.
- Analyzing multi-gigabyte JSON files locally
-
DuckDB: Querying JSON files as if they were tables
This is really cool!
With their Postgres scanner[0] you can now easily query multiple datasources using SQL and join between them (i.e. Postgres table with JSON file). Something I strived to build with OctoSQL[1] before.
It's amazing to see how quickly DuckDB is adding new features.
Not a huge fan of C++, which is right now used for authoring extensions, it'd be really cool if somebody implemented a Rust extension SDK, or even something like Steampipe[2] does for Postgres FDWs which would provide a shim for quickly implementing non-performance-sensitive extensions for various things.
Godspeed!
[0]: https://duckdb.org/2022/09/30/postgres-scanner.html
[1]: https://github.com/cube2222/octosql
[2]: https://steampipe.io
-
Show HN: ClickHouse-local – a small tool for serverless data analytics
Congrats on the Show HN!
It's great to see more tools in this area (querying data from various sources in-place) and the Lambda use case is a really cool idea!
I've recently done a bunch of benchmarking, including ClickHouse Local and the usage was straightforward, with everything working as it's supposed to.
Just to comment on the performance area though, one area I think ClickHouse could still possibly improve on - vs OctoSQL[0] at least - is that it seems like the JSON datasource is slower, especially if only a small part of the JSON objects is used. If only a single field of many is used, OctoSQL lazily parses only that field, and skips the others, which yields non-trivial performance gains on big JSON files with small queries.
Basically, for a query like `SELECT COUNT(*), AVG(overall) FROM books.json` with the Amazon Review Dataset, OctoSQL is twice as fast (3s vs 6s). That's a minor thing though (OctoSQL will slow down for more complicated queries, while for ClickHouse decoding the input is and remains the bottleneck).
[0]: https://github.com/cube2222/octosql
-
Steampipe – Select * from Cloud;
To add somewhat of a counterpoint to the other response, I've tried the Steampipe CSV plugin and got 50x slower performance vs OctoSQL[0], which is itself 5x slower than something like DataFusion[1]. The CSV plugin doesn't contact any external API's so it should be a good benchmark of the plugin architecture, though it might just not be optimized yet.
That said, I don't imagine this ever being a bottleneck for the main use case of Steampipe - in that case I think the APIs themselves will always be the limiting part. But it does - potentially - speak to what you can expect if you'd like to extend your usage of Steampipe to more than just DevOps data.
[0]: https://github.com/cube2222/octosql
[1]: https://github.com/apache/arrow-datafusion
Disclaimer: author of OctoSQL
-
Go runtime: 4 years later
Actually, folks just use gRPC or Yaegi in Go.
See Terraform[0], Traefik[1], or OctoSQL[2].
Although I agree plugins would be welcome, especially for performance reasons, though also to be able to compile and load go code into a running go process (JIT-ish).
[0]: https://github.com/hashicorp/terraform
[1]: https://github.com/traefik/traefik
[2]: https://github.com/cube2222/octosql
Disclaimer: author of OctoSQL
- Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet
-
Beginner interested in learning SQL. Have a few question that I wasn’t able to find on google.
Through more magic, you COULD of course use stuff like Spark, or easier with programs like TextQL, sq, OctoSQL.
-
How I Used DALL·E 2 to Generate The Logo for OctoSQL
The logo was created for OctoSQL and in the article you can find a lot of sample phrase-image combinations, as it describes the whole path (generation, variation, editing) I went down. Let me know what you think!
-
How I Used DALL·E 2 to Generate the Logo for OctoSQL
Hey, author here, happy to answer any questions!
The logo was created for OctoSQL[0] and in the article you can find a lot of sample phrase-image combinations, as it describes the whole path (generation, variation, editing) I went down. Let me know what you think!
[0]:https://github.com/cube2222/octosql
q
-
I wrote this iCalendar (.ics) command-line utility to turn common calendar exports into more broadly compatible CSV files.
CSV utilities (still haven't pick a favorite one...): https://github.com/harelba/q https://github.com/BurntSushi/xsv https://github.com/wireservice/csvkit https://github.com/johnkerl/miller
- Segítség kérés Excel automatizáláshoz
-
Show HN: ClickHouse-local – a small tool for serverless data analytics
I think they're talking about https://github.com/harelba/q, which is not very fast.
-
sqly - execute SQL against CSV / JSON with shell
Apparently, there were many who thought the same thing; Tools to execute SQL against CSV were trdsql, q, csvq, TextQL. They were highly functional, hoewver, had many options and no input completion. I found it just a little difficult to use.
-
Q – Run SQL Directly on CSV or TSV Files
Hi, author of q here.
Regarding the error you got, q currently does not autodetect headers, so you'd need to add -H as a flag in order to use the "country" column name. You're absolutely correct on failing-fast here - It's a bug which i'll fix.
In general regarding speed - q supports automatic caching of the CSV files (through the "-C readwrite" flag). Once it's activated, it will write the data into another file (with a .qsql extension), and will use it automatically in further queries in order to speed things considerably.
Effectively, the .qsql files are regular sqlite3 files (with some metadata), and q can be used to query them directly (or any regular sqlite3 file), including the ability to seamlessly join between multiple sqlite3 files.
http://harelba.github.io/q/#auto-caching-examples
- PostgreSQL alternative for Large amounts of data
-
q VS trdsql - a user suggested alternative
2 projects | 25 Jun 2022
- One-liner for running queries against CSV files with SQLite
What are some alternatives?
duckdb - DuckDB is an in-process SQL OLAP Database Management System
textql - Execute SQL against structured text like CSV or TSV
trdsql - CLI tool that can execute SQL queries on CSV, LTSV, JSON, YAML and TBLN. Can output to various formats.
csvq - SQL-like query language for csv
sqlitebrowser - Official home of the DB Browser for SQLite (DB4S) project. Previously known as "SQLite Database Browser" and "Database Browser for SQLite". Website at:
xsv - A fast CSV command line toolkit written in Rust.
sqlite-utils - Python CLI utility and library for manipulating SQLite databases
InquirerPy - :snake: Python port of Inquirer.js (A collection of common interactive command-line user interfaces)
ledger - Double-entry accounting system with a command-line reporting interface
lnav - Log file navigator
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks