Why I Use Nim instead of Python for Data Processing

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • viroiddb

    A curated database of all available viroid-like RNA sequences

  • I actually use TypeScript/JavaScript a lot for this reason, especially for biological algorithms that I want to run in the browser. The developer tooling is also as good as you can hope for, especially when using VS Code. I actually wrote a circular RNA sequence deduplication algorithm in it just recently [1].

    With respect to the identifier resolution in Nim, it strikes me as more of a matter of preference. Especially given the universal function call syntax in Nim, at least it's consistent. For example, Nim treats "ATGCA".lowerCase() the same as lowercase("ATGCA"). I do appreciate the fact that you can use a chaining syntax instead of a nesting one when doing multiple function calls but this is also a matter of style more than substance.

    [1] https://github.com/Benjamin-Lee/viroiddb/blob/main/scripts/c...

  • scikit-bio

    scikit-bio: a community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources.

  • You make a fair point that using optimized numerical libraries instead of string methods will be ridiculously fast because they're compiled anyway. For example, scikit-bio does just this for their reverse complement operation [1]. However, they use an 8 bit representation since they need to be able to represent the extended IUPAC notation for ambiguous bases, which includes things like the character N for "aNy" nucleotide [2]. One could get creative with a 4 bit encoding and still end up saving space (assuming you don't care about the distinction between upper versus lowercase characters in your sequence [2]). Or, if you know in advance your sequence is unambiguous (unlikely in DNA sequencing-derived data) you could use the 2 bit encoding. When dealing with short nucleotide sequences, another approach is to encode the sequence as an integer. I would love to see a library—Python, Nim, or otherwise—that made using the most efficient encoding for a sequence transparent to the developer.

    [1] https://github.com/biocore/scikit-bio/blob/b470a55a8dfd054ae...

    [2] https://en.wikipedia.org/wiki/Nucleic_acid_notation

    [3]

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • biofast

    Benchmarking programming languages/implementations for common tasks in Bioinformatics

  • benchmarks

    Some benchmarks of different languages

  • If anyone is interested to see how Nim fares against some other programming languages, here are some benchmarks: https://github.com/kostya/benchmarks

  • nimpylib

    Some python standard library functions ported to Nim

  • Another nim & python thread that has not been mentioned yet here

    https://news.ycombinator.com/item?id=28506531 - project allows creating pythonic bindings for your nim libraries pretty easily, which can be useful if you still want to write most of your toplevel code in python, but leverage nim's speed when it matters.

    If you want to make your nim code even more "pythonic" there is a https://github.com/Yardanico/nimpylib, and for calling some python code from nim there is a https://github.com/yglukhov/nimpy

  • nimpy

    Nim - Python bridge

  • Another nim & python thread that has not been mentioned yet here

    https://news.ycombinator.com/item?id=28506531 - project allows creating pythonic bindings for your nim libraries pretty easily, which can be useful if you still want to write most of your toplevel code in python, but leverage nim's speed when it matters.

    If you want to make your nim code even more "pythonic" there is a https://github.com/Yardanico/nimpylib, and for calling some python code from nim there is a https://github.com/yglukhov/nimpy

  • PrimesResult

    The results of the Dave Plummer's Primes Drag Race

  • The thing with Python is it's usually pretty easy to optimise quite impressively.

    E.g. random example:

    Sprinkle some cdef's in your python and suddenly you're faster than c++

    https://github.com/luizsol/PrimesResult

    https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Primes

    Prime Number Projects in C#/C++/Python

  • The thing with Python is it's usually pretty easy to optimise quite impressively.

    E.g. random example:

    Sprinkle some cdef's in your python and suddenly you're faster than c++

    https://github.com/luizsol/PrimesResult

    https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race...

  • Arraymancer

    A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

  • Apparently there's also a data processing library for Nim called Arraymancer[0] that's inspired by Numpy and PyTorch. It claims to be faster than both.

    [0] https://mratsim.github.io/Arraymancer/

  • nimtorch

    PyTorch - Python + Nim

  • There’s been a tremendous amount of work optimizing blas _and_ ensuring it’s numerically stable. Julia made a good choice to use blas first. Though it’s good to see new native implementations.

    For Nim, there’s also NimTorch which is interesting in that it builds on Nim’s C++ target to generate native PyTorch code. Even Python is technically a second class citizen for the C++ code. Most ML libraries are C++ all the way down.

    https://github.com/sinkingsugar/nimtorch

  • Not necessarily true with Julia. Many libraries like DifferentialEquations.jl are Julia all of the way down because the pure Julia BLAS tools outperform OpenBLAS and MKL in certain areas. For example see:

    https://github.com/YingboMa/RecursiveFactorization.jl/pull/2...

    So a stiff ODE solve is pure Julia, LU-factorizations and all.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts