genozip
tamtools
genozip | tamtools | |
---|---|---|
2 | 1 | |
149 | 3 | |
- | - | |
9.1 | 10.0 | |
15 days ago | over 7 years ago | |
C | Python | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
genozip
-
We're wasting money by only supporting gzip for raw DNA files
You might want to check out Genozip - a modern compressor for FASTQ, BAM/CRAM, VCF etc.
- Introducing Genozip - a file compressor for BAM, FASTQ, VCF etc
tamtools
-
We're wasting money by only supporting gzip for raw DNA files
A few years ago I built some tools https://github.com/tf318/tamtools to store alignments against two different reference assemblies in an efficient way (taking advantage of the fact that the majority of each alignment to different assemblies would in fact be the same, just shifted in position).
The intent was to enhance this to store alignments against multiple references as new references are published, and probably to rewrite in Rust or C rather than the initial Python effort.
In retrospect I would be interested to know whether this domain-specific compression effort, with zstd to the resulting "hybrid" alignment, would be more efficient than just letting zstd do its own thing with a full set of individual alignments against the different references.
What are some alternatives?
htslib - C library for high-throughput sequencing data formats
zstd - Zstandard - Fast real-time compression algorithm
BEETL - BEETL
community - An open community with an interest in developing and using new technologies for tensor data storage.
snappy - Helps you browse through and interpret your genotype data
sambamba - Tools for working with SAM/BAM data
Hail - Cloud-native genomic dataframes and batch computing