Faster tetranucleotide (k-mer) frequencies!

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • perl-for-reysenbach-lab

    These are perl scripts I developed over many years as a Bioinformaticist for the Reysenbach Lab at PSU. The Reysenbach Lab studies microbial diversity in extreme environments. Lotta fasta utilities here if you are into that sort of thing.

  • I saw Jennifer's post about re-writing her perl scripts in python and how she saw a 2.5 times improvement.

  • faster-perl-for-reysenbach

    Tracks the progress of making old Perl scripts faster and more maintainable. Working from Meneghin's perl-for-reysenbach-lab repository of bioinformatics scripts.

  • I have an interest in Perl and Science, so time to roll up sleeves and learn me some profiling/benchmarking. What follows is my internal monologue and the notes I scribbled down during the learning process. For those that want to follow along, I've created a small repo.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dotfiles

    is it worth the time? (by eh8)

  • There are no more obvious or easy gains here. Any more work is likely to yield small returns. Go outside, have a life or at the least consult the relevant chart.

  • hyperfine

    A command-line benchmarking tool

  • Search "benchmarking tools for linux" and decide that hyperfine is good for what I'm doing. Run Jennifer's new python script against my refactored perl and find that the python is 1.26 times faster for k=3 and 1.47 times faster for k=4. For the Covid-19 sequence, these are both on the order of hundreds of milliseconds.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts