C++ Bioinformatics

Open-source C++ projects categorized as Bioinformatics

Top 15 C++ Bioinformatic Projects

  • fastp

    An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)

    Project mention: Qurstion about automating trimming process | reddit.com/r/bioinformatics | 2022-05-26
  • bwa-mem2

    The next version of bwa-mem

    Project mention: Anyone use DRAGEN-GATK? | reddit.com/r/bioinformatics | 2022-10-12

    If you haven’t heard of it already you may want to check out https://github.com/bwa-mem2/bwa-mem2 which is a faster version of bwa-mem. I’ve been using it for a while now and found it to be quite stable, same results as the original and the speed improvement is nice.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • bowtie2

    A fast and sensitive gapped read aligner

  • megahit

    Ultra-fast and memory-efficient (meta-)genome assembler

  • edlib

    Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.

    Project mention: What's an efficient way to find multiple subsequences in several FASTQs? | reddit.com/r/bioinformatics | 2022-02-08

    I’ve got a similar situation. I was implementing the Smith-Waterman algorithm when I figured someone had to have already written a “fast” version of this. I found the edlib package (https://github.com/Martinsos/edlib) which does sequence alignment using Levenshtein distance. Essentially same DP algorithm as your traditional NW or SW only this is a C++ implementation with a Python wrapper. (I’m assuming you’re using Python, could be wrong though). The pertinent aspects of the output of this function contains the distance (dissimilarity) and the location (what index does the alignment start and end). This tool may go a ways to helping your pipeline. You could also look to metagenomic papers for inspiration as this is a problem (find a substring in a huge amount of data) that the community contends with all the time. Kmer based approach may also be useful if you want to attempt the alignment free path. Cheers.

  • hap.py

    Haplotype VCF comparison tools

    Project mention: Help running hap.py | reddit.com/r/bioinformatics | 2022-11-22

    I have been tasked with benchmarking a variant calling pipeline running hap.py as part of my bioinformatics MSc project.

  • seqan3

    The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

  • Sonar

    Write Clean C++ Code. Always.. Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • octopus

    Bayesian haplotype-based mutation calling (by luntergroup)

    Project mention: genotyping tool | reddit.com/r/bioinformatics | 2022-07-13

    Check out octopus https://github.com/luntergroup/octopus

  • bowtie

    An ultrafast memory-efficient short read aligner

    Project mention: Burrows–Wheeler Transform | news.ycombinator.com | 2022-09-24
  • GenomicSQLite

    Genomics Extension for SQLite

    Project mention: sqlite-zstd: Transparent dictionary-based row-level compression for SQLite - An SQLite extension written in Rust to reduce the database size without losing functionality | reddit.com/r/rust | 2022-07-31

    Yes, that is indeed an obviously missing part. I knew about ZIPVFS, but somehow forgot to investigate closer. Probably because I started this project before GenomicsSQLite was a thing (that seems like the best alternative).

  • rnaseqc

    Fast, efficient RNA-Seq metrics for quality control and process optimization

    Project mention: Tools for strand direction detection RNA-Seq | reddit.com/r/bioinformatics | 2022-10-08

    I like to use RNA-SeQC (https://github.com/getzlab/rnaseqc). It shows the percentage of forward/reverse reads that alingned to either the sense or antisense strands. It is also compatible with multiQC which is a big plus.

  • SnakeStrike

    A Low-cost Open-source High-speed Multi-camera Motion Capture System.

  • sshash

    A compressed, associative, exact, and weighted dictionary for k-mers.

    Project mention: Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2 | reddit.com/r/bioinformatics | 2022-09-08

    The paper describing a new tool from our lab has just been published in Genome Biology (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02743-6). Cuttlefish 2 is a tool for efficiently computing the compacted de Bruijn graph (or a spectrum preserving string set) from either raw sequencing reads or from reference genomes. It is quite fast and very memory efficient — for example, we were able to construct the compacted de Bruijn graph on a set of 661K bacterial genomes in 16 hours and 30 minutes using only 48.7GB of RAM. Construction of the compacted de Bruijn graph is an important initial processing step in e.g. genome assembly, and is also important in several other areas such as comparative genomics and as a critical step in building certain types of indices (e.g. [sshash](https://github.com/jermp/sshash)). You can find the cuttlefish 2 software on GitHub [here](https://github.com/COMBINE-lab/cuttlefish), and it can also be installed via Bioconda. We'd be happy to have your feedback!

  • TileDB-VCF

    Efficient variant-call data storage and retrieval library using the TileDB storage library.

    Project mention: Has anyone stored/queried VCFs and their variant records in a relational database? | reddit.com/r/bioinformatics | 2022-11-12

    Perhaps of interest https://github.com/TileDB-Inc/TileDB-VCF

  • kmer-signatures

    High-performance kmer-signatures

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-11-22.

C++ Bioinformatics related posts


What are some of the best open-source Bioinformatic projects in C++? This list will help you:

Project Stars
1 fastp 1,423
2 bwa-mem2 588
3 bowtie2 500
4 megahit 438
5 edlib 406
6 hap.py 332
7 seqan3 318
8 octopus 271
9 bowtie 240
10 GenomicSQLite 133
11 rnaseqc 110
12 SnakeStrike 70
13 sshash 67
14 TileDB-VCF 59
15 kmer-signatures 0
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives