cligen vs bioawk

cligen

Nim library to infer/generate command-line-interfaces / option / argument parsing; Docs at (by c-blake)

Suggest topics

Source Code

c-blake.github.io

Suggest alternative

Edit details

bioawk

BWK awk modified for biological data (by lh3)

Bioinformatics sequence-analysis

Source Code

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

cligen		bioawk
	Project
32	Mentions	8
489	Stars	572
-	Growth	-
8.4	Activity	0.0
19 days ago	Latest Commit	over 1 year ago
Nim	Language	C
ISC License	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

cligen

Posts with mentions or reviews of cligen. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-12.

CLI user experience case study
12 projects | news.ycombinator.com | 12 Jan 2024

There is also generating the whole thing from a function signature (e.g. https://github.com/c-blake/cligen ) since then CLauthors need not learn a new spec language, but then CLauthors must add back in helpful usage metadata/semantics and still need to learn a library API (but I like how those two things can be "gradual"). It's a hard space in which to find perfection, but I wish you luck in your attempt!
Things I've learned about building CLI tools in Python
16 projects | news.ycombinator.com | 24 Oct 2023

cligen also allows End-CL-users to adjust colorization of --help output like https://github.com/c-blake/cligen/blob/master/screenshots/di... using something like https://github.com/c-blake/cligen/wiki/Dark-BG-Config-File
Last I knew, the argparse backing most Py CLI solutions did not support such easier (for many) to read help text, but the PyUniverse is too vast to be sure without much related work searching.
Removing Garbage Collection from the Rust Language (2013)
9 projects | news.ycombinator.com | 11 Sep 2023

20 milliseconds? On my 7 year old Linux box, this little Nim program https://github.com/c-blake/bu/blob/main/wsz.nim runs to completion in 275 microseconds when fully statically linked with musl libc on Linux. That's with a stripped environment (with `env -i`). It takes more like 318 microseconds with my usual 54 environment variables. The program only does about 17 system calls, though.
Additionally, https://github.com/c-blake/cligen makes decent CLI tools a real breeze. If you like some of Go's qualities but the language seems too limited, you might like Nim: https://nim-lang.org. I generally find getting good performance much less of a challenge with Nim, but Nim is undeniably less well known with a smaller ecosystem and less corporate backing.
Writing Small CLI Programs in Common Lisp (2021)
5 projects | news.ycombinator.com | 5 Sep 2023
If you find this article interesting and are curious about Nim then you would probably also be curious about https://github.com/c-blake/cligen
That allows adding just 1-line to a module to add a pretty complete CLI and then a string per parameter to properly document options (assuming an existing API using keyword arguments).
It's also not hard to compile & link a static ELF binary with Nim.. I do it with MUSL libc on Linux all the time. I just toss into my ~/.config/nim/nim.cfg:
```
    @if musl:  # make nim c -d:musl .. foo static-link `foo` with musl
```
GNU Parallel, where have you been all my life?
19 projects | news.ycombinator.com | 21 Aug 2023

Sure. No problem.
Even Windows has popen these days. There are some tiny popenr/popenw wrappers in https://github.com/c-blake/cligen/blob/master/cligen/osUt.ni...
Depending upon how balanced work is on either side of the pipe, you usually can even get parallel speed-up on multicore with almost no work. For example, there is no need to use quote-escaped CSV parsing libraries when you just read from a popen()d translator program producing an easier format: https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim
The Bipolar Lisp Programmer
3 projects | news.ycombinator.com | 11 Aug 2023

Nim is terse yet general and can be made even more so with effort. E.g., You can gin up a little framework that is even more terse than awk yet statically typed and trivially convertible to run much faster like https://github.com/c-blake/bu/blob/main/doc/rp.md
You can statically introspect code to then generate related/translated ASTs to create nearly frictionless helper facilities like https://github.com/c-blake/cligen .
You can do all of this without any real run-time speed sacrifices, depending upon the level of effort you put in / your expertise. Since it generates C/C++ or Javascript you get all the abilities of backend compilers almost out of the box, like profile-guided-optimization or for JS JIT compilation.
Ask HN: Why did Nim not catch-on like wild fire as Rust did?
16 projects | news.ycombinator.com | 25 Jun 2023
It's more that those tools were what come to mind when I specifically think of my exposure to the existence of rust. Its perhaps not that the tools were there, but that they were well known (and known for being written in rust).
Anecdatapoint - I've never heard of literally a single one of the utilities listed on the bu page.
Regarding cligen, right from the start clap wins on producing idiomatic output. Compare: https://github.com/c-blake/cligen#cligen-a-native-api-inferr...
```
    Usage:
```
Newbie looking at nim
1 project | /r/nim | 10 Apr 2023

cool example would be this which is a CLI generation library. It lets you describe command line interfacs simply using function signatures
Zig and Rust
6 projects | news.ycombinator.com | 27 Mar 2023

>Does nim have anything as polished and performant as clap and serde?
"Polished" and "high quality" are more subjective/implicitly about adoption, IMO. "Performant" has many dimensions. I just tested the Nim https://github.com/c-blake/cligen vs clap: cligen used 5X less object file space (with all size optimization tweaks enabled in both), 20% less run-time memory for large argument lists, and the same run-time per argument (with march=native equivalents on both, within statistical noise). cligen has many features - "did you mean?/suggestions", color generated help and all that - I do not see obvious feature in clap docs missing in cligen. The Nim binary serde showing is unlikely as good but there are like 10 JSON packages and that seems maybe your primary concern.
More to add color your point than disagree (and follow up on my "adoption") - your ideas about polish, quality, docs, etc. are part of feedback loop(s) you mentioned. More users => Users complain (What is confusing? What is missing? etc.) => things get fixed/cleaned up/improved => More users. Besides "performant" being multi-dimensional, the feedback loop is more of a "cyclic graph". :-) While I probably prefer Nim as much or more as @netbioserror, I am not too shocked by the mindshare capture. It seems to happen every 5..10 years or so in prog.langs.
While many of your points are not invalid, tech is also a highly hype-driven & fad-driven realm. In my experience, the more experience with this meta-feature that someone has, the more skeptical they are of the latest thing (more rounds of regret, etc.). Also, that feedback graph is not a pure good. Things can get too popular too quickly with near permanent consequences. ipv4 got popular so quickly that we are still mostly stuck on it 40 years later as ipv6 struggles for penetration. Whatever your favorite PL is, it may also grow features too fast.
Self Hosted SaaS Alternatives
17 projects | news.ycombinator.com | 5 Mar 2023

You are welcome. Thanks are too rarely offered. :-)
You may also be interested in word stemming ( such as used by snowball stemmer in https://github.com/c-blake/nimsearch ) or other NLP techniques, but I don't know how internationalized/multi-lingual that stuff is, but conceptually you might want "series of stemmed words" to be the content fragments of interest.
Similarity scores have many applications. Weights on graph of cancelled downloads ranked by size might be one. :)
Of course, for your specific "truncation" problem, you might also be able to just do an edit distance against the much smaller filenames and compare data prefixes in files or use a SHA256 of a content-based first slice. ( There are edit distance algos in Nim in https://github.com/c-blake/cligen/blob/master/cligen/textUt.... as well as in https://github.com/c-blake/suggest ).
Or, you could do a little program like ndup/sh/ndup to create a "mirrored file tree" of such content-based slices then you could use any true duplicate-file finder (like https://github.com/c-blake/bu/blob/main/dups.nim) on the little signature system to identify duplicates and go from path suffixes in those clusters back to the main filesystem. Of course, a single KV store within one or two files would be more efficient than thousands of tiny files. There are many possibilities.

bioawk

Posts with mentions or reviews of bioawk. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-11.

Bioawk: Awk Modified for Biological Data
1 project | news.ycombinator.com | 31 Mar 2024
Any links to R-scripts for common NGS pipelines?
2 projects | /r/bioinformatics | 11 May 2023

Data wrangling is actually what awk excels at, and it's generally much more concise than R for that sort of thing. I'm aware that a lot of awk one liners look like gibberish to the uninitiated, but it actually makes a lot of sense when you understand the pattern-action structure of awk programs. It is also installed on any *nix system, there's no need to worry about installing dependencies or setting up virtual environments. And it's several times faster than R. Also Bioawk is glorious.
Is BioAwk frequently used, or even useful?
2 projects | /r/bioinformatics | 5 May 2023

A few months ago, I learned about this utility known as bioawk, written by Heng Li of samtools fame. Apparently, it is essentially a tweaked version of awk, with some extra goodies added for parsing and processing of bioinformatics file formats. While the functionality seems cool, I was wondering whether it is worth installing on my server, and incorporating into our workflows, because it seems so niche. I have not seen many references to it. Or is it better if we stick to Python scripts for this sort of work? Are there any computational speed advantages, etc. that bioawk offers over regular Python scripts for processing of, let's say, BED files or VCF files?
What are the most useful cutting edge tools I should learn for bioinformatics?
3 projects | /r/bioinformatics | 26 Apr 2023
My boss is considering letting me take a programming course if I have some good reasons why.
2 projects | /r/labrats | 13 Apr 2023

Beside that their core lectures to non-computer scientists are public (survey), workshops by software carpentry move around the globe. Maybe your intent to seed hands-on knowledge is in similar tune before heading for biopython, bioperl, bioawk. It doesn't hurt to tap into resources initially written for non-labrats either, e.g. about regular expressions by programming historian.
What are strictly data analysis jobs?
3 projects | /r/labrats | 22 Feb 2023

On the other hand, some of the techniques to set the ground for data analysis are equally valuable in other situations. The two installments about regular expressions on programming historian Understanding Regular Expressions and Cleaning OCR’d text with Regular Expressions, for example. They have no relevance to handling chemicals in the lab, yet since then, I find myself working with data files more efficiently, than earlier because of grep, an utility in Linux to crawl across data files. Or AWK, actually picking up theses "regexes", which I find generally useful since Benjamin Porter's "Hack the planet's text" (presentation video, and exercise video) with its link back to chem/bio e.g., to bioawk (btw, there equally is biopython, too).
Help they’re turning me into a programmer
3 projects | /r/labrats | 13 Feb 2023

Well, what language do you want to learn? What is your background so far? Assuming it is more on the side of biology, software carpentry's Python may eventually lead to biopython? Though there equally is a chance for AWK (Hack the planet's text! and bioawk...
Awk: The Power and Promise of a 40-Year-Old Language
4 projects | news.ycombinator.com | 7 Sep 2021

There's even a version of awk specifically designed for bioinformatics that natively knows how to handle fasta, fastq, and bam files, among other formats.
https://github.com/lh3/bioawk

What are some alternatives?

When comparing cligen and bioawk you can also consider the following projects:

httpbeast - A highly performant, multi-threaded HTTP 1.1 server written in Nim.

csvquote - Enables common unix utlities like cut, awk, wc, head to work correctly with csv data containing delimiters and newlines

nimforum - Lightweight alternative to Discourse written in Nim

orange - 🍊 :bar_chart: :bulb: Orange: Interactive data analysis

loggedfs - LoggedFS - Filesystem monitoring with Fuse

zarp - The Zavolab Automated RNA-seq Pipeline

lobster - The Lobster Programming Language

MethylDackel - A (mostly) universal methylation extractor for BS-seq experiments.

walkdir - Rust library for walking directories recursively.

readfq - Fast multi-line FASTA/Q reader in several programming languages

clap-rs - A full featured, fast Command Line Argument Parser for Rust

Biopython - Official git repository for Biopython (originally converted from CVS)

cligen vs httpbeast bioawk vs csvquote cligen vs nimforum bioawk vs orange cligen vs loggedfs bioawk vs zarp cligen vs lobster bioawk vs MethylDackel cligen vs walkdir bioawk vs readfq cligen vs clap-rs bioawk vs Biopython

Compare cligen vs bioawk and see what are their differences.

cligen

bioawk

cligen

bioawk

What are some alternatives?