Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

This page summarizes the projects mentioned and recommended in the original post on /r/bioinformatics

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • sshash

    A compressed, associative, exact, and weighted dictionary for k-mers.

  • The paper describing a new tool from our lab has just been published in Genome Biology (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02743-6). Cuttlefish 2 is a tool for efficiently computing the compacted de Bruijn graph (or a spectrum preserving string set) from either raw sequencing reads or from reference genomes. It is quite fast and very memory efficient — for example, we were able to construct the compacted de Bruijn graph on a set of 661K bacterial genomes in 16 hours and 30 minutes using only 48.7GB of RAM. Construction of the compacted de Bruijn graph is an important initial processing step in e.g. genome assembly, and is also important in several other areas such as comparative genomics and as a critical step in building certain types of indices (e.g. [sshash](https://github.com/jermp/sshash)). You can find the cuttlefish 2 software on GitHub [here](https://github.com/COMBINE-lab/cuttlefish), and it can also be installed via Bioconda. We'd be happy to have your feedback!

  • cuttlefish

    Building the compacted de Bruijn graph efficiently from references or reads.

  • The paper describing a new tool from our lab has just been published in Genome Biology (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02743-6). Cuttlefish 2 is a tool for efficiently computing the compacted de Bruijn graph (or a spectrum preserving string set) from either raw sequencing reads or from reference genomes. It is quite fast and very memory efficient — for example, we were able to construct the compacted de Bruijn graph on a set of 661K bacterial genomes in 16 hours and 30 minutes using only 48.7GB of RAM. Construction of the compacted de Bruijn graph is an important initial processing step in e.g. genome assembly, and is also important in several other areas such as comparative genomics and as a critical step in building certain types of indices (e.g. [sshash](https://github.com/jermp/sshash)). You can find the cuttlefish 2 software on GitHub [here](https://github.com/COMBINE-lab/cuttlefish), and it can also be installed via Bioconda. We'd be happy to have your feedback!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rspec

    (Rust) Rspec - a BDD test harness for stable Rust (by rust-rspec)

  • However, it looks like there are rspec-like testing frameworks for rust as well. For example this.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Fast and compact minimal perfect hash functions in C++

    1 project | news.ycombinator.com | 20 Feb 2024
  • protr VS seqinr - a user suggested alternative

    2 projects | 5 May 2024
  • Bioawk: Awk Modified for Biological Data

    1 project | news.ycombinator.com | 31 Mar 2024
  • FreeDict: Free Bilingual Dictionaries

    1 project | news.ycombinator.com | 17 Feb 2024
  • A look at the Mojo language for bioinformatics

    9 projects | news.ycombinator.com | 11 Feb 2024