Software engineers: consider working on genomics

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Hail

    Cloud-native genomic dataframes and batch computing

  • I don't have any funding to hire right now, but I'm always happy to chat about the industry and my experience building Hail (https://hail.is, https://github.com/hail-is/hail), a tool widely used by folks with large collections of human sequences.

    The other posters are not wrong about compensation. Total compensation is off by a factor of two to three.

    However, it is absolutely possible to work with a group of top-notch engineers on serious distributed systems & compilers in service of an excellent scientific-user experience. I know because I do. We are lucky to have a PI who respects and hires and diversity of expertise within his lab.

    I enjoy being deeply embedded with our users. I do not have to guess what they need or want because I help them do it every day.

    I also enjoy enmeshing engineering with statistics, mathematics, and biology. Work is more interesting when so many disciplines conspire towards the end of improved human health.

  • bioconda-recipes

    Conda recipes for the bioconda channel.

  • I contribute to Nextflow core (https://nf-co.re/) It's more of a collection of pipelines than traditional software, but there are users all around the world and a good community.

    Most of the packages on bioconda (https://bioconda.github.io/) are open source. But you probably want to find a sub-field that interests you most before finding a project.

    In grad school, we also had an ex-google software engineer volunteer with us one day a week. It was very impactful for many members of the lab to learn good engineering practices, and it wasn't at all like the sentiment others in this thread are expressing where engineers were "janitors".

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ptti

    Population-wide Testing, Tracing and Isolation Models

  • I recall once hearing from a VC about why they hardly invest in biotech (or it might have been reading it somewhere, memory is fuzzy). It boiled down to: way too much non-replicable research, often with suspicions of fraud by the original labs. It can easily be the case that a biotech startup burns through millions setting up a lab from scratch, then attempting to replicate some academic paper that they thought they could commercialize, only to discover that the effect doesn't really exist. This problem doesn't affect the software industry, so that's where the money goes.

    Why so few tooling companies - is there actually a market for good software in science? For there to be such a market most scientists would have to care about the correctness of their results, and care enough to spend grant money on improvements. They all claim to care, but observation of actual working practices points to the opposite too much of the time (of course there are some good apples!).

    In 2020 I got interested in research about COVID, so over the next couple of years I read a lot of papers and source code coming out of the health world. I also talked to some scientists and a coder who worked alongside scientists. He'd worked on malaria research, before deciding to change field because it was so corrupt. He also told me about an attempt to recruit a coder who'd worked on climate models who turned out to be quitting science entirely, for the same reason. The same anti-patterns would crop up repeatedly:

    - Programs would turn out to contain serious bugs that totally altered their output when fixed, but it would be ignored because nobody wants to retract papers. Instead scientists would lie or BS about the nature of the errors e.g. claiming huge result changes were actually small and irrelevant.

    - Validation is often non-existent or based on circular reasoning. As a consequence there are either no tests or the tests are meaningless.

    - Code is often write-once, run-once. Journals happily accept papers that propose an entirely ad-hoc and situation specific hypothesis that doesn't generalize at all, so very similar code is constantly being written then thrown away by hundreds of different isolated and competing groups.

    These issues will sooner or later cause honest programmers to doubt their role. What's the point in fixing bugs if nobody cares about incorrect results? How do you know your refactoring was correct if there are no unit tests and nobody can even tell you how to write them? How do you get people to use tools with better error checking if the only thing users care about is convenience of development? How do you create widely adopted abstractions beyond trivial data wrangling if the scientists are effectively being paid by LOC written?

    The validation issue is especially neuralgic. Scientists will check if a program they wrote works by simply eyeballing the output and deciding that it looks right. How do they know it looks right? Based on their expertise; you wouldn't understand, it's far too complicated for a non-scientist. Where does that expertise come from? By reading papers with graphs in them. Where do those graphs come from? More unvalidated programs. Missing in a disturbing number of cases - real world data, or acceptance that real data takes precedence over predicted data. Example from [1]: "we believe in checking models against each other, as it's the best way to understand which models work best in what circumstances". Another [2]: "There is agreement in the literature that comparing the results of different models provides important evidence of validity and increases model credibility".

    There are a bunch of people in this thread saying things like, oh, I'd love to help humanity but don't want to take the pay cut. To anyone thinking of going into science I'd strongly suggest you start by taking a few days to download papers from the lab you're thinking of joining and carefully checking them for mistakes, logical inconsistencies, absurd assumptions or assertions etc. Check the citations, ensure they actually support the claim being made. That sort of thing. If they have code on github go read it. Otherwise you might end up taking a huge pay cut only to discover that the lab or even whole field you've joined has simply become a self-reinforcing exercise in grant application, in which the software exists mostly for show.

    [1] https://github.com/ptti/ptti/blob/master/README.md

    [2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3001435/

  • poly

    A Go package for engineering organisms.

  • I write synthetic biology software for a living and maintain this open source, Go package for engineering DNA that has high test coverage and a nice little dev community around it.

    https://github.com/TimothyStiles/poly

    A large part of my project's community are devs that want to get into the field but can't tolerate the ridiculously low pay, laughably bad management, disrespect, and what amounts to 40+ years of technical debt that's endemic to biotech software.

    I've had companies here in the Bay Area offer me 100K a year with a straight face. I've had companies during interview tell me they're looking for someone to help, "set up GitHub". I've seen job listings for low paid web dev positions require applicants to have PhDs.

    The reality is that except for a growing handful of places management straight up won't know the difference between IT and software engineers. It's what I call the naive buyers problem.

    The demand for software engineers in biotech is generated by naive buyers that don't know what they need, why they need it, or how to get it.

    Benchling and Recursion Pharmaceuticals have reputations in the industry of paying, "standard software salaries". So do the research divisions at places like deepmind/microsoft/google but in my experience there's even new multi-billion dollar institutes where senior management has never even heard the term devops.

    Most places advertise for "data scientist", positions or some analog, instead of software engineers. This is mostly because upper management has never met an actual practicing software engineer in a professional setting. Many come from academia where the culture and work requirements heavily disincentivize standard software engineering practices.

    It's also not uncommon for a biotech company to either have a very under qualified CTO whose main programming experience is what they learned doing ML research like stuff during their PhD or not even have one at all which has huge downstream consequences.

    This week a software engineer trying to make the switch to biotech actually DM'd me to ask why they were seeing a ton of data science / ML job positions but no software engineering / devops positions.

    They were worried that these companies were trying to save on costs by forcing their data scientists to create infrastructure but it's actually worse than that. Most of these companies aren't even aware that there's supposed to be infrastructure.

    Despite all of this the future is looking better and I'm starting to find new companies and positions that are well... reasonable. I learned about this thread from a friend at a party last night that works at one of these companies. There's a small, strong new wave of companies and developers out there pushing biotech software forward. Hopefully some (including myself) make it big while pushing the idea that better tech equals better biotech.

  • serratus

    Ultra-deep search for novel viruses

  • Serratus (https://github.com/ababaian/serratus) is an OSS bioinformatics project created by a passionate group of volunteers. Short story is we're re-analyzing all of the world's DNA/RNA sequencing data to find new viruses that other people have missed. It works surprisingly well, but there's a ton left to do.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Medical professionals, what is the stupidest misconception a patient has had about the human body?

    1 project | /r/AskReddit | 20 Feb 2021
  • Reverse Engineering Source Code of the Biontech Pfizer Vaccine: Part 2

    1 project | /r/programming | 31 Dec 2020
  • Open Source DNA Sequencer Plans Unveiled

    1 project | news.ycombinator.com | 5 Feb 2024
  • Looking for an Open Source project to participate in for Google Summer of Code

    1 project | /r/golang | 10 Dec 2023
  • Show HN: GeneCodex – A free and open source genetic varient viewer

    2 projects | news.ycombinator.com | 26 May 2023