semi_index
octosql
semi_index | octosql | |
---|---|---|
1 | 35 | |
57 | 4,783 | |
- | - | |
10.0 | 2.7 | |
almost 12 years ago | 4 months ago | |
C++ | Go | |
GNU General Public License v3.0 or later | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
semi_index
octosql
- Feldera Incremental Compute Engine
-
Wazero: Zero dependency WebAssembly runtime written in Go
Never got it to anything close to a finished state, instead moving on to doing the same prototype in llvm and then cranelift.
That said, here's some of the wazero-based code on a branch - https://github.com/cube2222/octosql/tree/wasm-experiment/was...
It really is just a very very basic prototype.
- Analyzing multi-gigabyte JSON files locally
-
DuckDB: Querying JSON files as if they were tables
This is really cool!
With their Postgres scanner[0] you can now easily query multiple datasources using SQL and join between them (i.e. Postgres table with JSON file). Something I strived to build with OctoSQL[1] before.
It's amazing to see how quickly DuckDB is adding new features.
Not a huge fan of C++, which is right now used for authoring extensions, it'd be really cool if somebody implemented a Rust extension SDK, or even something like Steampipe[2] does for Postgres FDWs which would provide a shim for quickly implementing non-performance-sensitive extensions for various things.
Godspeed!
[0]: https://duckdb.org/2022/09/30/postgres-scanner.html
[1]: https://github.com/cube2222/octosql
[2]: https://steampipe.io
-
Show HN: ClickHouse-local – a small tool for serverless data analytics
Congrats on the Show HN!
It's great to see more tools in this area (querying data from various sources in-place) and the Lambda use case is a really cool idea!
I've recently done a bunch of benchmarking, including ClickHouse Local and the usage was straightforward, with everything working as it's supposed to.
Just to comment on the performance area though, one area I think ClickHouse could still possibly improve on - vs OctoSQL[0] at least - is that it seems like the JSON datasource is slower, especially if only a small part of the JSON objects is used. If only a single field of many is used, OctoSQL lazily parses only that field, and skips the others, which yields non-trivial performance gains on big JSON files with small queries.
Basically, for a query like `SELECT COUNT(*), AVG(overall) FROM books.json` with the Amazon Review Dataset, OctoSQL is twice as fast (3s vs 6s). That's a minor thing though (OctoSQL will slow down for more complicated queries, while for ClickHouse decoding the input is and remains the bottleneck).
[0]: https://github.com/cube2222/octosql
-
Steampipe – Select * from Cloud;
To add somewhat of a counterpoint to the other response, I've tried the Steampipe CSV plugin and got 50x slower performance vs OctoSQL[0], which is itself 5x slower than something like DataFusion[1]. The CSV plugin doesn't contact any external API's so it should be a good benchmark of the plugin architecture, though it might just not be optimized yet.
That said, I don't imagine this ever being a bottleneck for the main use case of Steampipe - in that case I think the APIs themselves will always be the limiting part. But it does - potentially - speak to what you can expect if you'd like to extend your usage of Steampipe to more than just DevOps data.
[0]: https://github.com/cube2222/octosql
[1]: https://github.com/apache/arrow-datafusion
Disclaimer: author of OctoSQL
-
Go runtime: 4 years later
Actually, folks just use gRPC or Yaegi in Go.
See Terraform[0], Traefik[1], or OctoSQL[2].
Although I agree plugins would be welcome, especially for performance reasons, though also to be able to compile and load go code into a running go process (JIT-ish).
[0]: https://github.com/hashicorp/terraform
[1]: https://github.com/traefik/traefik
[2]: https://github.com/cube2222/octosql
Disclaimer: author of OctoSQL
- Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet
-
Beginner interested in learning SQL. Have a few question that I wasn’t able to find on google.
Through more magic, you COULD of course use stuff like Spark, or easier with programs like TextQL, sq, OctoSQL.
-
How I Used DALL·E 2 to Generate The Logo for OctoSQL
The logo was created for OctoSQL and in the article you can find a lot of sample phrase-image combinations, as it describes the whole path (generation, variation, editing) I went down. Let me know what you think!
What are some alternatives?
jq-zsh-plugin - jq zsh plugin
DuckDB - DuckDB is an analytical in-process SQL database management system
json-toolkit - "the best opensource converter I've found across the Internet" -- dene14
q - q - Run SQL directly on delimited files and multi-file sqlite databases
json-buffet
trdsql - CLI tool that can execute SQL queries on CSV, LTSV, JSON, YAML and TBLN. Can output to various formats.
reddit_mining
lnav - Log file navigator
json-streamer - A fast streaming JSON parser for Python that generates SAX-like events using yajl
sqlitebrowser - Official home of the DB Browser for SQLite (DB4S) project. Previously known as "SQLite Database Browser" and "Database Browser for SQLite". Website at:
xsv - A fast CSV command line toolkit written in Rust.
sqlite-utils - Python CLI utility and library for manipulating SQLite databases