RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control. (by nf-core)

    First lets recognize that the framework presented has new features that don't exist in the previous DSLs you mention. Many developers highly value these additions and they along could justify a new stab at a workflow language: and for many the represent tradeoff * Interface generation * Declarative cloud resource provisionment * Static typing * Native python support This workflow has a similar level of complexity to nf-core/rnaseq (not the same, but similar in number of constituent tasks for the purpose of counting transcript abundance). It ingests raw sequencing reads, runs QC + trimming, does psuedo-alignment, recovers counts from abundance estimates, and aggregates counts over a many samples for direct use by diff-exp tools. (It is not 'running salmon'. I think that is a reductionist take.) It does this in addition to dynamically building React.js interfaces, adding static type validation to input parameters, and deploying cloud infrastructure in a simpler way. For the lines of code comparison, I think it is a weird way to compare software as the number of lines of code is no proxy for legibility, ease of development, likelihood of long-term maintenance (many more people know python than nextflow). Nonetheless nf-core/rnaseq has nearly 1000 lines alone in its workflow entrypoint alone - https://github.com/nf-core/rnaseq/blob/master/workflows/rnaseq.nf . With imported modules + subworkflows, LOC actually reaches the many thousands.. (Now I understand it is more complex and mature, but I highlight why I think the comparison is weird and wonder what you are even comparing to.) Whereas the entire logic of the presented pipeline is actually neatly encapsulated in 1200 lines of a single file. Overall this feels like a that doesn't come from a place of rational discourse, rather group dislike for something new and different. What I would like to do is address and talk about specific technical points (preferably over issues on github) so conversations can be productive and improve the tools I am working on.
    Not really sure why it's a problem for you, I'm working on rnaseq and they use a very big input dataset, also outputs huge datasets too. It uses docker so you can deploy fast on VMs.
