SaaSHub helps you find the best software and product alternatives Learn more →
Top 15 C++ Bioinformatic Projects
-
fastp
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
Project mention: Qurstion about automating trimming process | reddit.com/r/bioinformatics | 2022-05-26 -
If you haven’t heard of it already you may want to check out https://github.com/bwa-mem2/bwa-mem2 which is a faster version of bwa-mem. I’ve been using it for a while now and found it to be quite stable, same results as the original and the speed improvement is nice.
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
-
-
edlib
Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
Project mention: What's an efficient way to find multiple subsequences in several FASTQs? | reddit.com/r/bioinformatics | 2022-02-08I’ve got a similar situation. I was implementing the Smith-Waterman algorithm when I figured someone had to have already written a “fast” version of this. I found the edlib package (https://github.com/Martinsos/edlib) which does sequence alignment using Levenshtein distance. Essentially same DP algorithm as your traditional NW or SW only this is a C++ implementation with a Python wrapper. (I’m assuming you’re using Python, could be wrong though). The pertinent aspects of the output of this function contains the distance (dissimilarity) and the location (what index does the alignment start and end). This tool may go a ways to helping your pipeline. You could also look to metagenomic papers for inspiration as this is a problem (find a substring in a huge amount of data) that the community contends with all the time. Kmer based approach may also be useful if you want to attempt the alignment free path. Cheers.
-
I have been tasked with benchmarking a variant calling pipeline running hap.py as part of my bioinformatics MSc project.
-
seqan3
The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
-
Sonar
Write Clean C++ Code. Always.. Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Check out octopus https://github.com/luntergroup/octopus
-
-
Project mention: sqlite-zstd: Transparent dictionary-based row-level compression for SQLite - An SQLite extension written in Rust to reduce the database size without losing functionality | reddit.com/r/rust | 2022-07-31
Yes, that is indeed an obviously missing part. I knew about ZIPVFS, but somehow forgot to investigate closer. Probably because I started this project before GenomicsSQLite was a thing (that seems like the best alternative).
-
Project mention: Tools for strand direction detection RNA-Seq | reddit.com/r/bioinformatics | 2022-10-08
I like to use RNA-SeQC (https://github.com/getzlab/rnaseqc). It shows the percentage of forward/reverse reads that alingned to either the sense or antisense strands. It is also compatible with multiQC which is a big plus.
-
-
Project mention: Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2 | reddit.com/r/bioinformatics | 2022-09-08
The paper describing a new tool from our lab has just been published in Genome Biology (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02743-6). Cuttlefish 2 is a tool for efficiently computing the compacted de Bruijn graph (or a spectrum preserving string set) from either raw sequencing reads or from reference genomes. It is quite fast and very memory efficient — for example, we were able to construct the compacted de Bruijn graph on a set of 661K bacterial genomes in 16 hours and 30 minutes using only 48.7GB of RAM. Construction of the compacted de Bruijn graph is an important initial processing step in e.g. genome assembly, and is also important in several other areas such as comparative genomics and as a critical step in building certain types of indices (e.g. [sshash](https://github.com/jermp/sshash)). You can find the cuttlefish 2 software on GitHub [here](https://github.com/COMBINE-lab/cuttlefish), and it can also be installed via Bioconda. We'd be happy to have your feedback!
-
TileDB-VCF
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Project mention: Has anyone stored/queried VCFs and their variant records in a relational database? | reddit.com/r/bioinformatics | 2022-11-12Perhaps of interest https://github.com/TileDB-Inc/TileDB-VCF
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ Bioinformatics related posts
- Help running hap.py
- Anyone use DRAGEN-GATK?
- Tools for strand direction detection RNA-Seq
- Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2
- Ask HN: Should I publish my research code?
- [TileDB webinar] Population genomics is a data management problem
- Bioinformatics programming language
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007fea59209068>
www.saashub.com | 4 Feb 2023
Index
What are some of the best open-source Bioinformatic projects in C++? This list will help you:
Project | Stars | |
---|---|---|
1 | fastp | 1,423 |
2 | bwa-mem2 | 588 |
3 | bowtie2 | 500 |
4 | megahit | 438 |
5 | edlib | 406 |
6 | hap.py | 332 |
7 | seqan3 | 318 |
8 | octopus | 271 |
9 | bowtie | 240 |
10 | GenomicSQLite | 133 |
11 | rnaseqc | 110 |
12 | SnakeStrike | 70 |
13 | sshash | 67 |
14 | TileDB-VCF | 59 |
15 | kmer-signatures | 0 |