Select, put and delete data from JSON, TOML, YAML, XML and CSV files

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

  • I'm a big fan of miller (mlr) -- it's the tool I landed on when I needed to "graduate" from awk to look at CSV data. But when I read "go based" in your comment, I thought "nope, it's written in C". But no! It was ported to go -- very interesting!

    The developer wrote a comprehensive document explaining the rationale behind the porting that answered all my questions and a lot more: https://github.com/johnkerl/miller/blob/main/README-go-port.....

    Thought other miller/mlr fans (that don't follow its development) might find this interesting as well.

    (The dasel tool looks very cool, too -- looks like a good complement to mlr and similar tools!)

  • dasel

    Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • yj

    CLI - Convert between YAML, TOML, JSON, and HCL. Preserves map order.

  • jq

    Discontinued Command-line JSON processor [Moved to: https://github.com/jqlang/jq] (by stedolan)

  • brackit

    Query processor with proven optimizations, ready to use for your JSON store to query semi-structured data with JSONiq. Can also be used as an ad-hoc in-memory query processor.

  • sirix

    SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.

  • Regarding XQuery we just added JSON querying on top in Brackit[1] / SirixDB[2].

    Brackit is a retargetable query compiler and does a lot of optimizations at compile time as for instance optimizing joins and aggregations. It is useable as an in-memory processor or as a query processor of a database system.

    The Ph.D. thesis of Sebastian:

    Separating Key Concerns in Query Processing - Set Orientation, Physical Data Independence, and Parallelism

    http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publicatio...

    [1] http://brackit.io

    [2] https://sirix.io

  • flatterer

    Opinionated JSON to CSV/XLSX/SQLITE/PARQUET converter. Flattens JSON fast.

  • Try this: https://flatterer.opendata.coop/

    There is no binary yet but there is a python CLI and library, even though it is written in rust.

    It is the only tool that I know that deals with nested JSON and converts it into relational tables.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • json2csv

    command line tool to convert json to csv (by jehiah)

  • In addition to the already mentioned jq, there's https://github.com/jehiah/json2csv

  • jellex

    TUI to filter JSON and JSON Lines data with Python syntax

  • You could do something like this in pure python without the json loading boilerplate with jello[0]. An interactive TUI for jello called jellex[1} is also available. (I am the author)

    [0] https://github.com/kellyjonbrazil/jello

    [1] https://github.com/kellyjonbrazil/jellex

  • flatten-tool

    Tools for generating CSV and other flat versions of the structured data

  • * https://flatten-tool.readthedocs.io/en/latest/

    It's maintained by Open Data Services Coop, where we use it as a component in several of our web & data pipeline tools for working with data that is published in a Data Standard.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts