sage
Rust-Bio
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sage
-
Does anyone know a great guide/documentation explaining how to implement Percolator?
If you want to implement LDA from scratch, you could check out how Sage is doing it.
-
What are some good examples of well-engineered bioinformatics pipelines?
You could check out https://github.com/lazear/sage - it's a near comprehensive program/pipeline for analyzing DDA/shotgun proteomics data. Most proteomics pipelines consist of running multiple, separate tools in sequence (search, spectrum rescoring, retention time prediction, quantification), but sage performs all of these. This cuts down on the need for disk space for storing intermediate results (none required), the need for IO (files are read once), and results in a proteomics pipeline that is >10-1000x faster than anything else, including commercial solutions
-
Proteomics search engine written in Rust
You can also check out the intro blog post if you're interesting in learning more about the algorithm behind Sage. Beyond being fast, it also includes integrated machine learning (linear discriminant analysis, KDE) for rescoring spectral matches.
-
Opinions on AlphaPept
You could try out Sage, if you're looking for speed - I don't think you'll find anything faster. https://github.com/lazear/sage
Rust-Bio
- Bioinformatics Data Structures in Rust
- Bioinformatics with Rust
-
bioinformatic libraries and zig?
Does anyone know of zig native libraries for bioinformatics (here is a Rust example https://rust-bio.github.io/ )? It seems as though one could pull in a lot of bioinformatics C libraries such as done with https://github.com/brentp/hts-zig.
-
Proteomics search engine written in Rust
e.g. Rust-Bio
-
What are your top 3-5 programming languages and why?
I would start with the book and then rust-bio library. Rust is a pretty low level language compared to R/Python. It’s an especially good fit for writing efficient tools that make use of the kinds of algorithms / data structures that are implemented in rust-bio.
-
I have to admit. The free code camp course is a bit more sparing than I would have preferred. How did everyone learn Rust?
Absolutely! It already is, e.g., https://github.com/rust-bio/rust-bio. I'm moving from the academia/nonprofit world into industry bioinformatics, and I intend to use Rust as much as possible. I've already replaced as much of my Python as possible with Rust. I feel I'm able to create larger, more complex programs with Rust because I have the compiler to keep me from making common mistakes that are so easy to make in dynamically typed languages like Perl and Python. It might take longer to write a program initially, but I've started to create a library of functions I can paste together to do things like parse a positive integer, find a bunch of files with a certain file extension, search through data for a pattern, parse CSV files, etc. Writing my latest book has provided even more common patterns I keep finding I use over and over.
-
Is learning Rust and systems programming through the books Rust in Action and Crafting Interpreters a good idea?
I think there is huge potential for Rust in bioinformatics, and there are already some great projects like https://rust-bio.github.io/. It seems industry is also hiring for these skills. This Nature article is a little old, but also covers why people in the field are looking for greater safety and performance. It's relatively easy to write a Python program to do bio stuff, but it's also very easy to get lots of things wrong or for the resulting program to be slow and/or impossible to extend and maintain. In the long run, I think it makes sense to write in Rust. Perl was king in biofx when I started, and I would not have predicted it being displaced by Python, so there's good reason to believe that Python may one day be eclipsed by Rust.
-
Whats your favourite open source Rust project that needs more recognition?
Well, someone mentioned https://rust-bio.github.io/
-
How can one make Rust excel in the Sciences
So generally stuff in this maths/numerical space. The term is a bit deceptive because it rarely means domain-specific science libraries like rust-bio even thought that might be what you think when you hear "scientific computing".
What are some alternatives?
rnaseq - RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
dash - Data Apps & Dashboards for Python. No JavaScript Required.
seqkit - A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
kanidm - Kanidm: A simple, secure and fast identity management platform
fasten - :construction_worker: Fasten toolkit, for streaming operations on fastq files
clickhouse-rs - Asynchronous ClickHouse client library for Rust programming language.
mokapot - Fast and flexible semi-supervised learning for peptide detection in Python
GeoRust - Geospatial primitives and algorithms for Rust
juicer - A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
Rhai - Rhai - An embedded scripting language for Rust.
alphapept - A modular, python-based framework for mass spectrometry. Powered by nbdev.
cycle - Modern and safe symbolic mathematics