Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Neat! I have also built a similar project in Rust https://github.com/roapi/roapi/tree/main/columnq-cli :)
dsq references a benchmark done by q (https://github.com/harelba/q/blob/master/test/BENCHMARK.md) that indicates that octosql is significantly slower.
However, octosql's GH repo claims otherwise.
Does anyone have any real world experience that they can share on these tools?
Yeah frankly the q benchmark isn't the best even though dsq compares favorably in it. It isn't well documented and exercises a very limited amount of functionality and isn't very rigorous from what I can see. That said, the caching q does is likely very solid (and not something dsq does).
The biggest risk I think with octosql (and cube2222 is here somewhere to disagree with me if I'm wrong) is that they have their own entire SQL engine whereas textql, q and dsq use SQLite. But q is also in Python whereas textql, octosql, and dsq are in Go.
In the next few weeks I'll be posting some benchmarks that I hope are a little fairer (or at least well-documented and reproducible). Though of course it would be appropriate to have independent benchmarks too since I now have a dog in the fight.
On a tangent, once the go-duckdb binding [0] matures I'd love to offer duckdb as an alternative engine flag within dsq (and DataStation). Would be neat to see.
[0] https://github.com/marcboeker/go-duckdb
Binaries are now available! https://github.com/multiprocessio/dsq#macos-linux
Related posts
- How moving from Pandas to Polars made me write better code without writing better code
- Polars
- Full-fledged APIs for slowly moving datasets without writing code
- Ask HN: Best way to provide access to large data sets
- "thread 'main' panicked at 'no CA certificates found'", when running application in docker container