biofast
Benchmarking programming languages/implementations for common tasks in Bioinformatics (by lh3)
scikit-bio
scikit-bio: a community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources. (by scikit-bio)
biofast | scikit-bio | |
---|---|---|
3 | 2 | |
175 | 838 | |
- | 1.4% | |
0.0 | 8.8 | |
over 2 years ago | 3 days ago | |
C | Python | |
- | BSD 3-clause "New" or "Revised" License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
biofast
Posts with mentions or reviews of biofast.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-09-23.
-
Parsing huge files in Python
FYI: the python packages I mentioned earlier can all directly read gzip'd fastq files. See also this repo for examples.
-
Does Rust Support Reading in FATSA files?
needletail is rated in the Heng Li benchmark (https://github.com/lh3/biofast/)
- Why I Use Nim instead of Python for Data Processing
scikit-bio
Posts with mentions or reviews of scikit-bio.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-09-23.
- What are some of the bioinformatic projects I could do on python as a beginner?
-
Why I Use Nim instead of Python for Data Processing
You make a fair point that using optimized numerical libraries instead of string methods will be ridiculously fast because they're compiled anyway. For example, scikit-bio does just this for their reverse complement operation [1]. However, they use an 8 bit representation since they need to be able to represent the extended IUPAC notation for ambiguous bases, which includes things like the character N for "aNy" nucleotide [2]. One could get creative with a 4 bit encoding and still end up saving space (assuming you don't care about the distinction between upper versus lowercase characters in your sequence [2]). Or, if you know in advance your sequence is unambiguous (unlikely in DNA sequencing-derived data) you could use the 2 bit encoding. When dealing with short nucleotide sequences, another approach is to encode the sequence as an integer. I would love to see a library—Python, Nim, or otherwise—that made using the most efficient encoding for a sequence transparent to the developer.
[1] https://github.com/biocore/scikit-bio/blob/b470a55a8dfd054ae...
[2] https://en.wikipedia.org/wiki/Nucleic_acid_notation
[3]
What are some alternatives?
When comparing biofast and scikit-bio you can also consider the following projects:
nimtorch - PyTorch - Python + Nim
PrimesResult - The results of the Dave Plummer's Primes Drag Race
readfq - Fast multi-line FASTA/Q reader in several programming languages
nimpylib - Some python standard library functions ported to Nim
viroiddb - A curated database of all available viroid-like RNA sequences
RecursiveFactorization.jl
benchmarks - Some benchmarks of different languages
Primes - Prime Number Projects in C#/C++/Python