Awk: The Power and Promise of a 40-Year-Old Language

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • bioawk

    BWK awk modified for biological data

  • There's even a version of awk specifically designed for bioinformatics that natively knows how to handle fasta, fastq, and bam files, among other formats.

    https://github.com/lh3/bioawk

  • csvquote

    Enables common unix utlities like cut, awk, wc, head to work correctly with csv data containing delimiters and newlines

  • CSVs with quoted fields and imbedded newlines can be troublesome in awk. Years ago I had found a script that worked for me, I'm not sure but I think it was this:

    http://lorance.freeshell.org/csv/

    There's also https://github.com/dbro/csvquote which is more unix-like in philosophy: it only handles transforming the CVS data into something that awk (or other utilities) can more easily deal with. I haven't used it but will probably try it next time I need something like that.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Unix-Pledge

    Perl support for pledge(2) syscall

  • cligen

    Nim library to infer/generate command-line-interfaces / option / argument parsing; Docs at

  • When you have a standardized problem setting like the implicit loop in awk, n alternative to a whole new programming language is a simple < 100 lines of code program generator [1].

    This design lets you retain easy access to large sets of pre-existing libraries as well as have a "compiled/statically typed" situation, if you want. It also leverages familiarity with your existing programming languages. I adapted a similar small program like this to emit a C program, but anything else is obviously pretty easy. Easy is good. Familiar is good.

    Interactivity-wise, with a TinyC/tcc fast running compiler backend my `rp` programs run sub-second from ENTER to completion on small data. Even with not optimizing tcc, they they still run faster than byte-compiled/VM interpreted mawk/gawk on a per input-byte basis. If you take the time to do an optimized build with gcc -O3/etc., they can run much faster.

    And I leave the source code around if you want to just use the program generator as a way to save keystrokes/get a fast start on a row processing program.

    Anyway, I'm not trying to start a language holy war, but just exhibit how if you rotate the problem (or your head looking at the problem) ever so slightly another answer exists in this space and is quite easy. :-)

    [1] https://github.com/c-blake/cligen/blob/master/examples/rp.ni...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts