Our great sponsors
-
csvquote
Enables common unix utlities like cut, awk, wc, head to work correctly with csv data containing delimiters and newlines
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
There's even a version of awk specifically designed for bioinformatics that natively knows how to handle fasta, fastq, and bam files, among other formats.
https://github.com/lh3/bioawk
CSVs with quoted fields and imbedded newlines can be troublesome in awk. Years ago I had found a script that worked for me, I'm not sure but I think it was this:
http://lorance.freeshell.org/csv/
There's also https://github.com/dbro/csvquote which is more unix-like in philosophy: it only handles transforming the CVS data into something that awk (or other utilities) can more easily deal with. I haven't used it but will probably try it next time I need something like that.
When you have a standardized problem setting like the implicit loop in awk, n alternative to a whole new programming language is a simple < 100 lines of code program generator [1].
This design lets you retain easy access to large sets of pre-existing libraries as well as have a "compiled/statically typed" situation, if you want. It also leverages familiarity with your existing programming languages. I adapted a similar small program like this to emit a C program, but anything else is obviously pretty easy. Easy is good. Familiar is good.
Interactivity-wise, with a TinyC/tcc fast running compiler backend my `rp` programs run sub-second from ENTER to completion on small data. Even with not optimizing tcc, they they still run faster than byte-compiled/VM interpreted mawk/gawk on a per input-byte basis. If you take the time to do an optimized build with gcc -O3/etc., they can run much faster.
And I leave the source code around if you want to just use the program generator as a way to save keystrokes/get a fast start on a row processing program.
Anyway, I'm not trying to start a language holy war, but just exhibit how if you rotate the problem (or your head looking at the problem) ever so slightly another answer exists in this space and is quite easy. :-)
[1] https://github.com/c-blake/cligen/blob/master/examples/rp.ni...