Why I Use Nim instead of Python for Data Processing

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

viroiddb

2 5 0.0 Vue

A curated database of all available viroid-like RNA sequences

I actually use TypeScript/JavaScript a lot for this reason, especially for biological algorithms that I want to run in the browser. The developer tooling is also as good as you can hope for, especially when using VS Code. I actually wrote a circular RNA sequence deduplication algorithm in it just recently [1].
With respect to the identifier resolution in Nim, it strikes me as more of a matter of preference. Especially given the universal function call syntax in Nim, at least it's consistent. For example, Nim treats "ATGCA".lowerCase() the same as lowercase("ATGCA"). I do appreciate the fact that you can use a chaining syntax instead of a nesting one when doing multiple function calls but this is also a matter of style more than substance.
[1] https://github.com/Benjamin-Lee/viroiddb/blob/main/scripts/c...

scikit-bio

2 833 8.8 Python

scikit-bio: a community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources.

You make a fair point that using optimized numerical libraries instead of string methods will be ridiculously fast because they're compiled anyway. For example, scikit-bio does just this for their reverse complement operation [1]. However, they use an 8 bit representation since they need to be able to represent the extended IUPAC notation for ambiguous bases, which includes things like the character N for "aNy" nucleotide [2]. One could get creative with a 4 bit encoding and still end up saving space (assuming you don't care about the distinction between upper versus lowercase characters in your sequence [2]). Or, if you know in advance your sequence is unambiguous (unlikely in DNA sequencing-derived data) you could use the 2 bit encoding. When dealing with short nucleotide sequences, another approach is to encode the sequence as an integer. I would love to see a library—Python, Nim, or otherwise—that made using the most efficient encoding for a sequence transparent to the developer.
[1] https://github.com/biocore/scikit-bio/blob/b470a55a8dfd054ae...
[2] https://en.wikipedia.org/wiki/Nucleic_acid_notation
[3]

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
biofast

3 175 0.0 C

Benchmarking programming languages/implementations for common tasks in Bioinformatics
benchmarks

40 2,741 7.2 Makefile

Some benchmarks of different languages

If anyone is interested to see how Nim fares against some other programming languages, here are some benchmarks: https://github.com/kostya/benchmarks

nimpylib

1 182 3.4 Nim

Some python standard library functions ported to Nim

Another nim & python thread that has not been mentioned yet here
https://news.ycombinator.com/item?id=28506531 - project allows creating pythonic bindings for your nim libraries pretty easily, which can be useful if you still want to write most of your toplevel code in python, but leverage nim's speed when it matters.
If you want to make your nim code even more "pythonic" there is a https://github.com/Yardanico/nimpylib, and for calling some python code from nim there is a https://github.com/yglukhov/nimpy

nimpy

38 1,416 5.8 Nim

Nim - Python bridge

Another nim & python thread that has not been mentioned yet here
https://news.ycombinator.com/item?id=28506531 - project allows creating pythonic bindings for your nim libraries pretty easily, which can be useful if you still want to write most of your toplevel code in python, but leverage nim's speed when it matters.
If you want to make your nim code even more "pythonic" there is a https://github.com/Yardanico/nimpylib, and for calling some python code from nim there is a https://github.com/yglukhov/nimpy

PrimesResult

6 28 0.0

The results of the Dave Plummer's Primes Drag Race

The thing with Python is it's usually pretty easy to optimise quite impressively.
E.g. random example:
Sprinkle some cdef's in your python and suddenly you're faster than c++
https://github.com/luizsol/PrimesResult
https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Primes

45 2,357 7.0 C#

Prime Number Projects in C#/C++/Python

The thing with Python is it's usually pretty easy to optimise quite impressively.
E.g. random example:
Sprinkle some cdef's in your python and suddenly you're faster than c++
https://github.com/luizsol/PrimesResult
https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race...

Arraymancer

21 1,304 8.2 Nim

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Apparently there's also a data processing library for Nim called Arraymancer[0] that's inspired by Numpy and PyTorch. It claims to be faster than both.
[0] https://mratsim.github.io/Arraymancer/

nimtorch

3 452 10.0 Nim

PyTorch - Python + Nim

There’s been a tremendous amount of work optimizing blas _and_ ensuring it’s numerically stable. Julia made a good choice to use blas first. Though it’s good to see new native implementations.
For Nim, there’s also NimTorch which is interesting in that it builds on Nim’s C++ target to generate native PyTorch code. Even Python is technically a second class citizen for the C++ code. Most ML libraries are C++ all the way down.
https://github.com/sinkingsugar/nimtorch

RecursiveFactorization.jl

8 74 6.1 Julia

Not necessarily true with Julia. Many libraries like DifferentialEquations.jl are Julia all of the way down because the pure Julia BLAS tools outperform OpenBLAS and MKL in certain areas. For example see:
https://github.com/YingboMa/RecursiveFactorization.jl/pull/2...
So a stiff ODE solve is pure Julia, LU-factorizations and all.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project