-
perl-for-reysenbach-lab
These are perl scripts I developed over many years as a Bioinformaticist for the Reysenbach Lab at PSU. The Reysenbach Lab studies microbial diversity in extreme environments. Lotta fasta utilities here if you are into that sort of thing.
I saw Jennifer's post about re-writing her perl scripts in python and how she saw a 2.5 times improvement.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
faster-perl-for-reysenbach
Tracks the progress of making old Perl scripts faster and more maintainable. Working from Meneghin's perl-for-reysenbach-lab repository of bioinformatics scripts.
I have an interest in Perl and Science, so time to roll up sleeves and learn me some profiling/benchmarking. What follows is my internal monologue and the notes I scribbled down during the learning process. For those that want to follow along, I've created a small repo.
-
There are no more obvious or easy gains here. Any more work is likely to yield small returns. Go outside, have a life or at the least consult the relevant chart.
-
Search "benchmarking tools for linux" and decide that hyperfine is good for what I'm doing. Run Jennifer's new python script against my refactored perl and find that the python is 1.26 times faster for k=3 and 1.47 times faster for k=4. For the Covid-19 sequence, these are both on the order of hundreds of milliseconds.