readfq
fastp
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
readfq
-
Training resources for Biopython?
Heng Li has a FASTQ/FASTA reader that I generally cut and paste into my code rather than use Biopython. Biopython has a very rich model for sequence data but you generally don't need 90% of it and it comes at a significant performance cost.
-
Extract sequences given FASTA + list of starts and ends?
Just slap Heng Li's FASTQ/A parsing function in, load in your read, then loop through your coordinates and slice the sequence.
fastp
-
R pipelines for bulk RNA-seq analyses
fastp + multiQC + Salmon + DESeq2 all some nextflow workflow. It is a good exercise (not complicated) to create the pipeline from scratch the first time to properly understand each tool.
-
NHI Genome Studies: Mexico Govt Sept 12 Congressional hearing
1) QC the data with fastp. This'll trim out adapters and toss reads that are poor quality.
- Illumina adapters and quality trimming
-
Low-complexity sequence filtering tool
fastp has an adjustable low complexity filter option.
-
Can you evaluate my pipeline?
- in terms of preprocessing and QC, I prefer fastp (https://github.com/OpenGene/fastp)
-
Current QC tools for short read and long read sequencing
I generally use fastp as an all-in-one tool for short reads: https://github.com/OpenGene/fastp
- Qurstion about automating trimming process
-
What methods (conda installable only please) can you use to determine the complexity of a fastq file? (e.g., kmer analysis)
I don't know if this fits exactly what you need, but I'm using fastp to check my fastq.gz files lately: https://github.com/OpenGene/fastp. You can install it via conda.
-
A tool to count basepair in fastq file
If you also need some other basic statistics or want to filter the reads you can try fastp (https://github.com/OpenGene/fastp). If only the basepair count is needed, awk might be the fastest solution as suggested before.
What are some alternatives?
biofast - Benchmarking programming languages/implementations for common tasks in Bioinformatics
galaxy - Data intensive science for everyone.
bioawk - BWK awk modified for biological data
readfq - A simple tool to calculate reads number and total base count in FASTQ file
fasql - DuckDB Extension for reading and writing FASTA and FASTQ Files
glslSmartDeNoise - Fast glsl deNoise spatial filter, with circular gaussian kernel, full configurable
biomisc - collection of miscellaneous command line bioinformatic scripts
nextclade - Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
minimap2 - A versatile pairwise aligner for genomic and spliced nucleotide sequences
seqtk - Toolkit for processing sequences in FASTA/Q formats
bowtie2 - A fast and sensitive gapped read aligner