genozip
BEETL
genozip | BEETL | |
---|---|---|
2 | 1 | |
149 | 94 | |
- | - | |
9.1 | 1.2 | |
14 days ago | about 1 year ago | |
C | C++ | |
GNU General Public License v3.0 or later | BSD 2-clause "Simplified" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
genozip
-
We're wasting money by only supporting gzip for raw DNA files
You might want to check out Genozip - a modern compressor for FASTQ, BAM/CRAM, VCF etc.
- Introducing Genozip - a file compressor for BAM, FASTQ, VCF etc
BEETL
-
We're wasting money by only supporting gzip for raw DNA files
If you have re-sequencing data of model species (which applies to >80% of generated sequencing data), the storage issue is often solved using CRAM/BAM formats. The FASTQ can be reconstructed if unmapped reads are stored in the file.
More general (pre-alignment) sequence compression methods never really took of (e.g. https://github.com/BEETL/BEETL). Probably because it helps so much to have common format that most workflows can start with. Here, the replacement of gzip with zstd would be a lower hanging fruit to start with.
What are some alternatives?
htslib - C library for high-throughput sequencing data formats
zstd - Zstandard - Fast real-time compression algorithm
snappy - Helps you browse through and interpret your genotype data
Hail - Cloud-native genomic dataframes and batch computing
sambamba - Tools for working with SAM/BAM data
community - An open community with an interest in developing and using new technologies for tensor data storage.
tamtools - Create and manage hybrid reference assemblies to consolidate two original DNA alignments against different reference assemblies.