Our great sponsors
-
miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
dasel
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
brackit
Query processor with proven optimizations, ready to use for your JSON store to query semi-structured data with JSONiq. Can also be used as an ad-hoc in-memory query processor.
-
sirix
SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I'm a big fan of miller (mlr) -- it's the tool I landed on when I needed to "graduate" from awk to look at CSV data. But when I read "go based" in your comment, I thought "nope, it's written in C". But no! It was ported to go -- very interesting!
The developer wrote a comprehensive document explaining the rationale behind the porting that answered all my questions and a lot more: https://github.com/johnkerl/miller/blob/main/README-go-port.....
Thought other miller/mlr fans (that don't follow its development) might find this interesting as well.
(The dasel tool looks very cool, too -- looks like a good complement to mlr and similar tools!)
Regarding XQuery we just added JSON querying on top in Brackit[1] / SirixDB[2].
Brackit is a retargetable query compiler and does a lot of optimizations at compile time as for instance optimizing joins and aggregations. It is useable as an in-memory processor or as a query processor of a database system.
The Ph.D. thesis of Sebastian:
Separating Key Concerns in Query Processing - Set Orientation, Physical Data Independence, and Parallelism
http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publicatio...
[2] https://sirix.io
Try this: https://flatterer.opendata.coop/
There is no binary yet but there is a python CLI and library, even though it is written in rust.
It is the only tool that I know that deals with nested JSON and converts it into relational tables.
In addition to the already mentioned jq, there's https://github.com/jehiah/json2csv
You could do something like this in pure python without the json loading boilerplate with jello[0]. An interactive TUI for jello called jellex[1} is also available. (I am the author)
* https://flatten-tool.readthedocs.io/en/latest/
It's maintained by Open Data Services Coop, where we use it as a component in several of our web & data pipeline tools for working with data that is published in a Data Standard.