fastp
kraken2
fastp | kraken2 | |
---|---|---|
9 | 7 | |
1,775 | 658 | |
2.3% | - | |
4.7 | 5.1 | |
27 days ago | about 1 month ago | |
C++ | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
fastp
-
R pipelines for bulk RNA-seq analyses
fastp + multiQC + Salmon + DESeq2 all some nextflow workflow. It is a good exercise (not complicated) to create the pipeline from scratch the first time to properly understand each tool.
-
NHI Genome Studies: Mexico Govt Sept 12 Congressional hearing
1) QC the data with fastp. This'll trim out adapters and toss reads that are poor quality.
- Illumina adapters and quality trimming
-
Low-complexity sequence filtering tool
fastp has an adjustable low complexity filter option.
-
Can you evaluate my pipeline?
- in terms of preprocessing and QC, I prefer fastp (https://github.com/OpenGene/fastp)
-
Current QC tools for short read and long read sequencing
I generally use fastp as an all-in-one tool for short reads: https://github.com/OpenGene/fastp
- Qurstion about automating trimming process
-
What methods (conda installable only please) can you use to determine the complexity of a fastq file? (e.g., kmer analysis)
I don't know if this fits exactly what you need, but I'm using fastp to check my fastq.gz files lately: https://github.com/OpenGene/fastp. You can install it via conda.
-
A tool to count basepair in fastq file
If you also need some other basic statistics or want to filter the reads you can try fastp (https://github.com/OpenGene/fastp). If only the basepair count is needed, awk might be the fastest solution as suggested before.
kraken2
-
NHI Genome Studies: Mexico Govt Sept 12 Congressional hearing
3) Use Kraken2 to classify remaining reads. I'd start with the standard database.
-
Refseq bacterial genomes to clean reads?
See more information in: kraken2 manual
-
Fastest way to check E. coli contamination levels in eukaryotic NGS libraries?
If you've got a fast solid state drive with >200G of space, then kraken2 + bracken works really well. First, use kraken2 to map reads to taxa in memory-mapped mode (to reduce system memory consumption):
-
Inferring bacterial population sizes from metagenomic data
Yes, that can be done. Bacterial proportions is pretty much what programs like Kraken2 and Centrifuge do.
-
Command line tool for species identification from Fasta files
Or Kraken2
-
How can I generate a list of short (75-150bp) sequences from a bacterial genome and find out if any of those sequences are unique to that organism?
For bacterial metagenomic stuff you can quickly reduce the amount of sequences you need to BLAST by using Kraken2.
-
Show HN: An API for running computationally intensive tools
While implementing and scaling data analysis pipelines at a biotech startup, I spent most of my time getting new tools running efficiently and scaling them. Implementing something like Kraken2 for genomic analysis (https://github.com/DerrickWood/kraken2) on our infrastructure took weeks and was hard to scale. I expected a library for running these tools on managed infrastructure via an API to exist – like Twilio for sending text messages or Stripe for processing payments – but I couldn't find any.
Toolchest is an API for running data analysis tools easily (i.e. copy and paste a few lines of code), without managing the infrastructure. We're starting with computational genomics tools, but tools in other spaces can be added. Please drop me a message if you have a use case in mind! For example, I've thought about making hashcat powered by Tesla V100 GPUs accessible via our API.
All feedback is welcome! If you're curious about how it works, feel free to check out our docs: https://toolchest-python-client.readthedocs.io/en/latest/use...
What are some alternatives?
galaxy - Data intensive science for everyone.
bowtie2 - A fast and sensitive gapped read aligner
readfq - A simple tool to calculate reads number and total base count in FASTQ file
glslSmartDeNoise - Fast glsl deNoise spatial filter, with circular gaussian kernel, full configurable
nextclade - Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
readfq - Fast multi-line FASTA/Q reader in several programming languages
seqtk - Toolkit for processing sequences in FASTA/Q formats
fasql - DuckDB Extension for reading and writing FASTA and FASTQ Files
Sniffles - Structural variation caller using third generation sequencing
CHM13 - The complete sequence of a human genome
TPMCalculator - TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files