Miller – tool for querying, shaping, reformatting data in CSV, TSV, and JSON

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

    The 10x number was before improvements on https://github.com/johnkerl/miller/pull/786 et al. The earlier negative perf results were my fault, not Go's -- I was focusing initially on the port and feature development, leaving benchmarking and optimization until the end. That said, Go is a bit slower than C line for line; however, Miller 5 (in C) was single-threaded and Miller 6 (in Go) actively uses multicore. This is why complex processing chains now run much quicker in Go than in C -- due to multicore and pipelining which are much easier to do in Go.

  • RecordStream

    commandline tools for slicing and dicing JSON records.

    It's interesting watching these types of tools get re-invented periodically:

    https://github.com/benbernard/RecordStream

    It shows the unix model of many small, composable tools is very powerful, but also shows that POSIX is missing some essential pieces that everyone keeps trying to add/reinvent.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • DataProfiler

    What's in your data? Extract schema, statistics and entities from datasets

    My team built a similar tool in Python to load any delimited file, json, parquet and Avro with one command:

    https://github.com/capitalone/DataProfiler

    Effectively loads anything into a dataframe

  • datastation

    App to easily query, script, and visualize data from every database, file, and API.

    I just published dsq [0] for running SQL queries against CSV/JSON/Excel/Parquet/etc or just converting those files to JSON.

    [0] https://github.com/multiprocessio/datastation/tree/main/runn...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts