Show HN: Adding proper CSV support to AWK

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • dstkdata

    The (large) data files needed for the Data Science Toolkit project

    FYI I ran on the worldcities data at https://github.com/petewarden/dstkdata (credit to xsv for choosing that dataset) against https://github.com/BurntSushi/xsv and https://github.com/liquidaty/zsv (full disclosure: I am one of the zsv authors). Here's what I got.

    fastest to slowest: zsv (0.07), xsv (0.16), goawk (0.42), python (~1.6)

    Obviously, does not tell the whole story as this test was limited to "count" and an interpreted language is expected to always be slower compared to a precompiled command, but, it might be relevant to a user deciding what tool to use. Also, might be instructive as to room for improvement in the go code (or possibly the go code could use the c lib)-- I note that even if the goawk command is '{}' the runtime is still about the same.

    full results:

    goawk:

  • xsv

    A fast CSV command line toolkit written in Rust.

    FYI I ran on the worldcities data at https://github.com/petewarden/dstkdata (credit to xsv for choosing that dataset) against https://github.com/BurntSushi/xsv and https://github.com/liquidaty/zsv (full disclosure: I am one of the zsv authors). Here's what I got.

    fastest to slowest: zsv (0.07), xsv (0.16), goawk (0.42), python (~1.6)

    Obviously, does not tell the whole story as this test was limited to "count" and an interpreted language is expected to always be slower compared to a precompiled command, but, it might be relevant to a user deciding what tool to use. Also, might be instructive as to room for improvement in the go code (or possibly the go code could use the c lib)-- I note that even if the goawk command is '{}' the runtime is still about the same.

    full results:

    goawk:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • zsv

    zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more

    FYI I ran on the worldcities data at https://github.com/petewarden/dstkdata (credit to xsv for choosing that dataset) against https://github.com/BurntSushi/xsv and https://github.com/liquidaty/zsv (full disclosure: I am one of the zsv authors). Here's what I got.

    fastest to slowest: zsv (0.07), xsv (0.16), goawk (0.42), python (~1.6)

    Obviously, does not tell the whole story as this test was limited to "count" and an interpreted language is expected to always be slower compared to a precompiled command, but, it might be relevant to a user deciding what tool to use. Also, might be instructive as to room for improvement in the go code (or possibly the go code could use the c lib)-- I note that even if the goawk command is '{}' the runtime is still about the same.

    full results:

    goawk:

  • goawk

    A POSIX-compliant AWK interpreter written in Go, with CSV support

    I've now updated the benchmarks to avoid huge.csv: the write benchmarks now write a big 1GB, 20-column CSV and the read benchmarks use that same CSV: https://github.com/benhoyt/goawk/commit/07eb4505a7f64cffceeb...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts