spades
sage
spades | sage | |
---|---|---|
4 | 5 | |
664 | 188 | |
1.7% | - | |
9.3 | 7.7 | |
4 days ago | 19 days ago | |
C++ | Rust | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spades
- my friend showed me his code, thees are all functions
- What are some good examples of well-engineered bioinformatics pipelines?
-
Genome analysis cost
If you do DNA sequencing and receive the sequencing files as fastq files (normal from sequencing) then spades to assemble the genome, then put it through PROKKA to annotate it. Here's a beginners guide, the most difficult part is downloading the programs onto your laptop.
-
Is it possible to assemble a complete bacterial genome using short reads?
MetaSpades has a cool option to hybrid reform contigs using short and long reads so you could pair short read data and long read data (PacBio/ONT) to get the best hybrid assembly with high throughput and long reference reads for resolving the reassembling. https://github.com/ablab/spades
sage
-
Does anyone know a great guide/documentation explaining how to implement Percolator?
If you want to implement LDA from scratch, you could check out how Sage is doing it.
-
What are some good examples of well-engineered bioinformatics pipelines?
You could check out https://github.com/lazear/sage - it's a near comprehensive program/pipeline for analyzing DDA/shotgun proteomics data. Most proteomics pipelines consist of running multiple, separate tools in sequence (search, spectrum rescoring, retention time prediction, quantification), but sage performs all of these. This cuts down on the need for disk space for storing intermediate results (none required), the need for IO (files are read once), and results in a proteomics pipeline that is >10-1000x faster than anything else, including commercial solutions
-
Proteomics search engine written in Rust
You can also check out the intro blog post if you're interesting in learning more about the algorithm behind Sage. Beyond being fast, it also includes integrated machine learning (linear discriminant analysis, KDE) for rescoring spectral matches.
-
Opinions on AlphaPept
You could try out Sage, if you're looking for speed - I don't think you'll find anything faster. https://github.com/lazear/sage
What are some alternatives?
prokka - :zap: :aquarius: Rapid prokaryotic genome annotation
rnaseq - RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
mag - Assembly and binning of metagenomes
seqkit - A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
fasten - :construction_worker: Fasten toolkit, for streaming operations on fastq files
bwa - Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
mokapot - Fast and flexible semi-supervised learning for peptide detection in Python
trinityrnaseq - Trinity RNA-Seq de novo transcriptome assembly
juicer - A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
Rust-Bio - This library provides implementations of many algorithms and data structures that are useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.