Ask HN: Should I publish my research code?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • minion

  • I've posted a huge amount of academic code (I've linked to a small number at the end). I think you should, but it won't help advance your career immediately. However, I still think it's better for science.

    What is useful is if you can produce code people can build on and do their own cool stuff with -- then they will cite you. However, getting something to a state where it is tested for all reasonable inputs, has some basic docs, etc. is a hard untaking.

    https://github.com/minion/minion (C++ constraint solver)

    https://github.com/stacs-cp/demystify (Python puzzle solver)

    https://github.com/peal/vole (Rust group theory solver)

  • demystify

  • I've posted a huge amount of academic code (I've linked to a small number at the end). I think you should, but it won't help advance your career immediately. However, I still think it's better for science.

    What is useful is if you can produce code people can build on and do their own cool stuff with -- then they will cite you. However, getting something to a state where it is tested for all reasonable inputs, has some basic docs, etc. is a hard untaking.

    https://github.com/minion/minion (C++ constraint solver)

    https://github.com/stacs-cp/demystify (Python puzzle solver)

    https://github.com/peal/vole (Rust group theory solver)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • vole

    A GAP package for backtrack search in permutation groups with graphs (by peal)

  • I've posted a huge amount of academic code (I've linked to a small number at the end). I think you should, but it won't help advance your career immediately. However, I still think it's better for science.

    What is useful is if you can produce code people can build on and do their own cool stuff with -- then they will cite you. However, getting something to a state where it is tested for all reasonable inputs, has some basic docs, etc. is a hard untaking.

    https://github.com/minion/minion (C++ constraint solver)

    https://github.com/stacs-cp/demystify (Python puzzle solver)

    https://github.com/peal/vole (Rust group theory solver)

  • 3d-reorganization-prostate-cancer

    Code, analysis, and results for Hawley, Zhou, et al., Cancer Research, 2021.

  • You're right, it is substantially more work to clean and organize the code for publishing. Being open about your work does make the attack surface much larger and more likely to be nitpicked, criticized, have an error found, etc.

    But it is more honest. Whatever you think about the effort required to do this, there's value in honesty.

    Here is an example of my own scientific work:

    - paper [0]

    - preprint [1]

    - GitHub [2]

    It certainly wasn't easy to get all of this done. But doing this can also be a guide for others. They get to see exactly what you've done so that they don't waste months on the exact implementation. They can see where maybe you've made some mistakes to avoid them. They can see so much of the implicit knowledge that is left out of your paper and learn from it. Your code isn't going to be perfect, but what paper is, either?

    Everyone will be a critic, anyway, so make it easy to pick up criticism of the stuff you feel the least confident in and do better next time. You won't get better if no one sees your code.

    [0]: https://cancerres.aacrjournals.org/content/81/23/5833

    [1]: https://www.biorxiv.org/content/10.1101/2021.01.05.425333v2

    [2]: https://github.com/LupienLab/3d-reorganization-prostate-canc...

  • You're right, it is substantially more work to clean and organize the code for publishing. Being open about your work does make the attack surface much larger and more likely to be nitpicked, criticized, have an error found, etc.

    But it is more honest. Whatever you think about the effort required to do this, there's value in honesty.

    Here is an example of my own scientific work:

    - paper [0]

    - preprint [1]

    - GitHub [2]

    It certainly wasn't easy to get all of this done. But doing this can also be a guide for others. They get to see exactly what you've done so that they don't waste months on the exact implementation. They can see where maybe you've made some mistakes to avoid them. They can see so much of the implicit knowledge that is left out of your paper and learn from it. Your code isn't going to be perfect, but what paper is, either?

    Everyone will be a critic, anyway, so make it easy to pick up criticism of the stuff you feel the least confident in and do better next time. You won't get better if no one sees your code.

    [0]: https://cancerres.aacrjournals.org/content/81/23/5833

    [1]: https://www.biorxiv.org/content/10.1101/2021.01.05.425333v2

    [2]: https://github.com/LupienLab/3d-reorganization-prostate-canc...

  • cd4-histone-paper-code

    Public release of most of the data analysis code for Lamere et. al. 2016

  • FWIW, this is how I've released the crappy barely-working "academic quality" code for a paper in the past:

    https://github.com/DarwinAwardWinner/cd4-histone-paper-code

    The main points are that I made only a minimal attempt to organize it, and I made the state of the code clear in the README. I don't recall anyone complaining about the code or even mentioning it during review. (I also don't recall whether I published the code before or after the paper was accepted.)

  • superconductor

    A tool to simulate superconducting circuits, comparable to SPICE. (by adewes)

  • I published some of my Academic code like a tool for simulating superconducting circuits [1] or a tool to manage lab instruments for quantum computing (or other) experiments [2]. It's super niche but both tools have found users in other labs that even keep developing them (at least for [2]). And it's nice to look at your code after 10 years and realize how much you've grown as a programmer :)

    [1]: https://github.com/adewes/superconductor

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pyview

    pyview contains all reusable and generic classes and functions that I used in my qubit data acquisition setup during my PhD thesis. (by adewes)

  • python-qubit-setup

    All scripts for controlling the instruments and acquiring data in our qubit setup.

  • crux

    Software toolkit for molecular phylogenetic inference (by canonware)

  • Been there, done that. I published my doctoral research code [1] so that others could inspect, verify, replicate, extend, etc. YMMV, but the feedback I received from other researchers ranged from neutral to surprisingly positive (e.g. people using it in ways that pleasantly surprised me). But let me expand on my own experiences while developing that software, trying to figure out how to replicate the then-current state of the art.

    At the time there were two widely used software packages for phylogenetic inference, PAUP* [2] and MrBayes [3]. The source code for MrBayes was available, and although at the time I had some pretty strong criticisms of the code structure, it was immensely valuable to my research, and I remain very grateful to its author for sharing the code. In contrast the PAUP* source was not available, and I struggled immensely to replicate some of its algorithms. As a case in point, I needed to compute the natural log of the gamma function with similar precision, but there was no documentation for how PAUP* did this. I eventually discovered that the PAUP* author had shared some of the low-level code with another project. Based on comments in that code I pulled the original references from the 60s literature and solved these problems that had plagued me for months in a matter of days. Now, from what I could see in that shared PAUP* code, I suspect that the PAUP* code is of very high quality. But the author significantly reduced his scientific impact by keeping the source to himself.

    [1]: https://github.com/canonware/crux

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts