Why bad scientific code beats code following "best practices"

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • kubernetes

    Production-Grade Container Scheduling and Management

  • There are some things that should be in one long function (or method).

    Consider dealing with the output of a (lexical) tokeniser. It is much easier to maintain a massive switch statement (or a bunch of ifs/elseifs) to handle each token, with calls to other functions to do the actual processing, such that each case is just a token and a function call. Grouping them in some way not required by the code is an illusory "gain": it hides the complexity of the actual function in a bunch of files you don't look at, when this is not a natural abstraction of the problem at all and when those files introduce extra layers of flow control where tricky bugs can hide. Or see the "PLEASE DO NOT ATTEMPT TO SIMPLIFY THIS CODE" comment from the Kubernetes source[0]. A 300 line function that does one thing and which cannot be usefully divided into smaller units is more maintainable than any alternative. Attempting to break it up will make it worse.

    That being said, I agree that nearly all 300 line functions in the wild are not like this.

    [0] https://github.com/kubernetes/kubernetes/blob/ec2e767e593953...

  • why-natural-selection

    Code for the paper "Human capital mediates natural selection in contemporary humans"

  • Ulp. As a scientist who is a hobby "programmer", this struck close to home. I've got one project[1] with a huge mess of functions calling each other. It started out with good intentions, but then gradually descended as I wrote more and more hacks to add new analyses & robustness checks. I swear I meant well!

    I think there's a genuine tension between writing good code and "shipping" a paper. At least, when I program "as a programmer" I think my code is mostly higher quality.

    [1] https://github.com/hughjonesd/why-natural-selection/blob/mas...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dvc

    🦉 ML Experiments and Data Management with Git

  • What you’re describing sounds like DVC (at a higher-ish—80%-solution level).

    https://dvc.org/

    See pachyderm too.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts