Show HN: Dataframes in Elixir Backed by Rust

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • explorer

    Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

  • nx

    Multi-dimensional arrays (tensors) and numerical definitions for Elixir (by elixir-nx)

  • With the advent of Nx [1] and Axon [2] for Elixir, I felt like the missing piece was the data munging en route to the machine learning. With Polars [3] making strides and Rustler [4] available to make a safe bridge for Elixir NIFs, I saw a path to get dataframes in Elixir that are _fast_.

    I'm really pleased with how it turned out. Inspired by the Nx architecture, which uses pluggable backends, I build a thin user-facing API that calls into (theoretically, as polars is the only extant backend) pluggable dataframe backends. What really excited me about this approach is that it permits a similar approach to what we see in dplyr [5], where you can manipulate in-memory data frames using the same API as remote databases or spark dataframes.

    Next up is to move to lazy-by-default. To be as unsurprising as possible, Explorer dataframes are (for all intents and purposes) immutable. This has required a fair amount of copying when using the eager API from Polars, and because Rustler NIF resources use atomic reference counting and the GC only sweeps intermittently, there can be some pretty bad memory performance. Fortunately Polars also has a lazy API. The plan is to use that with 'peeking' for display.

    After that, I'd like to move into additional backends. I'm particularly keen on Ecto (database) and Apache Arrow/Ballista for distributed and OLAP work. There is also work underway for a pure Elixir backend so the library can ship without a Rust dependency. Speaking of which, there's work on prebuilt binaries underway as well.

    I'd love feedback on the API! I aimed for a dplyr-ish API as I think it melds better with a functional language than pandas. Generally I find dplyr more intuitive than pandas. The philosophy here is to get from brain to data as simply and intuitively as possible.

    Finally, contributions and any other feedback are super, super welcome. It's early days and I'm also a startup founder so I haven't been able to dedicate as much time as I'd like, but I try to get some work done and add features at least once a week.

    Thanks for looking!

    [1] https://github.com/elixir-nx/nx

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dplyr

    dplyr: A grammar of data manipulation

  • axon

    Nx-powered Neural Networks

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • Rustler

    Safe Rust bridge for creating Erlang NIF functions

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts