Show HN: Adding proper CSV support to AWK

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

dstkdata

1 215 10.0

The (large) data files needed for the Data Science Toolkit project

FYI I ran on the worldcities data at https://github.com/petewarden/dstkdata (credit to xsv for choosing that dataset) against https://github.com/BurntSushi/xsv and https://github.com/liquidaty/zsv (full disclosure: I am one of the zsv authors). Here's what I got.
fastest to slowest: zsv (0.07), xsv (0.16), goawk (0.42), python (~1.6)
Obviously, does not tell the whole story as this test was limited to "count" and an interpreted language is expected to always be slower compared to a precompiled command, but, it might be relevant to a user deciding what tool to use. Also, might be instructive as to room for improvement in the go code (or possibly the go code could use the c lib)-- I note that even if the goawk command is '{}' the runtime is still about the same.
full results:
goawk:
xsv

64 10,058 0.0 Rust

A fast CSV command line toolkit written in Rust.

FYI I ran on the worldcities data at https://github.com/petewarden/dstkdata (credit to xsv for choosing that dataset) against https://github.com/BurntSushi/xsv and https://github.com/liquidaty/zsv (full disclosure: I am one of the zsv authors). Here's what I got.
fastest to slowest: zsv (0.07), xsv (0.16), goawk (0.42), python (~1.6)
Obviously, does not tell the whole story as this test was limited to "count" and an interpreted language is expected to always be slower compared to a precompiled command, but, it might be relevant to a user deciding what tool to use. Also, might be instructive as to room for improvement in the go code (or possibly the go code could use the c lib)-- I note that even if the goawk command is '{}' the runtime is still about the same.
full results:
goawk:
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
zsv

25 169 7.4 C

zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more

FYI I ran on the worldcities data at https://github.com/petewarden/dstkdata (credit to xsv for choosing that dataset) against https://github.com/BurntSushi/xsv and https://github.com/liquidaty/zsv (full disclosure: I am one of the zsv authors). Here's what I got.
fastest to slowest: zsv (0.07), xsv (0.16), goawk (0.42), python (~1.6)
Obviously, does not tell the whole story as this test was limited to "count" and an interpreted language is expected to always be slower compared to a precompiled command, but, it might be relevant to a user deciding what tool to use. Also, might be instructive as to room for improvement in the go code (or possibly the go code could use the c lib)-- I note that even if the goawk command is '{}' the runtime is still about the same.
full results:
goawk:
goawk

19 1,877 7.1 Go

A POSIX-compliant AWK interpreter written in Go, with CSV support

I've now updated the benchmarks to avoid huge.csv: the write benchmarks now write a big 1GB, 20-column CSV and the read benchmarks use that same CSV: https://github.com/benhoyt/goawk/commit/07eb4505a7f64cffceeb...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project