scikit-bio
nimtorch
scikit-bio | nimtorch | |
---|---|---|
2 | 3 | |
833 | 452 | |
0.8% | - | |
8.8 | 10.0 | |
7 days ago | almost 5 years ago | |
Python | Nim | |
BSD 3-clause "New" or "Revised" License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scikit-bio
- What are some of the bioinformatic projects I could do on python as a beginner?
-
Why I Use Nim instead of Python for Data Processing
You make a fair point that using optimized numerical libraries instead of string methods will be ridiculously fast because they're compiled anyway. For example, scikit-bio does just this for their reverse complement operation [1]. However, they use an 8 bit representation since they need to be able to represent the extended IUPAC notation for ambiguous bases, which includes things like the character N for "aNy" nucleotide [2]. One could get creative with a 4 bit encoding and still end up saving space (assuming you don't care about the distinction between upper versus lowercase characters in your sequence [2]). Or, if you know in advance your sequence is unambiguous (unlikely in DNA sequencing-derived data) you could use the 2 bit encoding. When dealing with short nucleotide sequences, another approach is to encode the sequence as an integer. I would love to see a library—Python, Nim, or otherwise—that made using the most efficient encoding for a sequence transparent to the developer.
[1] https://github.com/biocore/scikit-bio/blob/b470a55a8dfd054ae...
[2] https://en.wikipedia.org/wiki/Nucleic_acid_notation
[3]
nimtorch
- NimTorch: PyTorch front end by generating C++ native ATen code
- Nim Front End for Pytirch
-
Why I Use Nim instead of Python for Data Processing
There’s been a tremendous amount of work optimizing blas _and_ ensuring it’s numerically stable. Julia made a good choice to use blas first. Though it’s good to see new native implementations.
For Nim, there’s also NimTorch which is interesting in that it builds on Nim’s C++ target to generate native PyTorch code. Even Python is technically a second class citizen for the C++ code. Most ML libraries are C++ all the way down.
https://github.com/sinkingsugar/nimtorch
What are some alternatives?
PrimesResult - The results of the Dave Plummer's Primes Drag Race
Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
nimpylib - Some python standard library functions ported to Nim
viroiddb - A curated database of all available viroid-like RNA sequences
biofast - Benchmarking programming languages/implementations for common tasks in Bioinformatics
RecursiveFactorization.jl
nimpy - Nim - Python bridge
Primes - Prime Number Projects in C#/C++/Python