biofast
readfq
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
biofast
-
Parsing huge files in Python
FYI: the python packages I mentioned earlier can all directly read gzip'd fastq files. See also this repo for examples.
-
Does Rust Support Reading in FATSA files?
needletail is rated in the Heng Li benchmark (https://github.com/lh3/biofast/)
- Why I Use Nim instead of Python for Data Processing
readfq
-
Training resources for Biopython?
Heng Li has a FASTQ/FASTA reader that I generally cut and paste into my code rather than use Biopython. Biopython has a very rich model for sequence data but you generally don't need 90% of it and it comes at a significant performance cost.
-
Extract sequences given FASTA + list of starts and ends?
Just slap Heng Li's FASTQ/A parsing function in, load in your read, then loop through your coordinates and slice the sequence.
What are some alternatives?
nimtorch - PyTorch - Python + Nim
fastp - An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
scikit-bio - scikit-bio: a community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources.
bioawk - BWK awk modified for biological data
PrimesResult - The results of the Dave Plummer's Primes Drag Race
fasql - DuckDB Extension for reading and writing FASTA and FASTQ Files
viroiddb - A curated database of all available viroid-like RNA sequences
biomisc - collection of miscellaneous command line bioinformatic scripts
RecursiveFactorization.jl
minimap2 - A versatile pairwise aligner for genomic and spliced nucleotide sequences
benchmarks - Some benchmarks of different languages
Primes - Prime Number Projects in C#/C++/Python