Why bad scientific code beats code following "best practices"

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

kubernetes

657 106,778 10.0 Go

Production-Grade Container Scheduling and Management

There are some things that should be in one long function (or method).
Consider dealing with the output of a (lexical) tokeniser. It is much easier to maintain a massive switch statement (or a bunch of ifs/elseifs) to handle each token, with calls to other functions to do the actual processing, such that each case is just a token and a function call. Grouping them in some way not required by the code is an illusory "gain": it hides the complexity of the actual function in a bunch of files you don't look at, when this is not a natural abstraction of the problem at all and when those files introduce extra layers of flow control where tricky bugs can hide. Or see the "PLEASE DO NOT ATTEMPT TO SIMPLIFY THIS CODE" comment from the Kubernetes source[0]. A 300 line function that does one thing and which cannot be usefully divided into smaller units is more maintainable than any alternative. Attempting to break it up will make it worse.
That being said, I agree that nearly all 300 line functions in the wild are not like this.
[0] https://github.com/kubernetes/kubernetes/blob/ec2e767e593953...

why-natural-selection

1 4 10.0 R

Code for the paper "Human capital mediates natural selection in contemporary humans"

Ulp. As a scientist who is a hobby "programmer", this struck close to home. I've got one project[1] with a huge mess of functions calling each other. It started out with good intentions, but then gradually descended as I wrote more and more hacks to add new analyses & robustness checks. I swear I meant well!
I think there's a genuine tension between writing good code and "shipping" a paper. At least, when I program "as a programmer" I think my code is mostly higher quality.
[1] https://github.com/hughjonesd/why-natural-selection/blob/mas...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
dvc

109 13,116 9.7 Python

🦉 ML Experiments and Data Management with Git

What you’re describing sounds like DVC (at a higher-ish—80%-solution level).
https://dvc.org/
See pachyderm too.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project