adam
cramino
adam | cramino | |
---|---|---|
3 | 1 | |
967 | 111 | |
0.2% | - | |
6.1 | 7.7 | |
about 1 month ago | 2 months ago | |
Scala | Rust | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
adam
-
biobear -- python package with minimal dependencies for bioinformatic file parsing and querying using rust and polars as the backend
FYI: ADAM seems to do that
-
Advanced Scientific Data Format
We presented using Parquet formats for bioinformatics 2012/13-ish at the Bioinformatics Open Source Conference (BOSC) and got laughed out of the place.
While using Apache Spark for bioinformatics [0] never really took off, I still think Parquet formats for bioinformatics [1] is a good idea, especially with DuckDB, Apache Arrow, etc. supporting Parquet out of the box.
0 - https://github.com/bigdatagenomics/adam
1 - https://github.com/bigdatagenomics/bdg-formats
-
Seq: A programming language for high-performance computational genomics
We're here, still plugging along.
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
https://github.com/bigdatagenomics/adam
cramino
-
biobear -- python package with minimal dependencies for bioinformatic file parsing and querying using rust and polars as the backend
What do you see as the use case for this, specifically as it relates to the BAM reading? I've used pysam to read and iterate bamfiles to generate custom summary reports but this can be very slow with large files with many records. I know there are some things written in rust that show significant speed improvements (for example a tool I used nanostat was partially rewritten as cramino and purports to be much faster).
What are some alternatives?
seq - A high-performance, Pythonic language for bioinformatics
biobear - Work with bioinformatic files using Arrow, Polars, and/or DuckDB
bioconda-recipes - Conda recipes for the bioconda channel.
nimconf2021 - Slides for Nimconf21
asdf - ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data
uvfs - Microscopic C++20 archive format
sito - sito: A serialization suite
mleap - MLeap: Deploy ML Pipelines to Production
asdf - Extendable version manager with support for Ruby, Node.js, Elixir, Erlang & more
Nim - Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
Biopython - Official git repository for Biopython (originally converted from CVS)
bdg-formats - Open source formats for scalable genomic processing systems using Avro. Apache 2 licensed.