Our great sponsors
-
explorer
Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
With the advent of Nx [1] and Axon [2] for Elixir, I felt like the missing piece was the data munging en route to the machine learning. With Polars [3] making strides and Rustler [4] available to make a safe bridge for Elixir NIFs, I saw a path to get dataframes in Elixir that are _fast_.
I'm really pleased with how it turned out. Inspired by the Nx architecture, which uses pluggable backends, I build a thin user-facing API that calls into (theoretically, as polars is the only extant backend) pluggable dataframe backends. What really excited me about this approach is that it permits a similar approach to what we see in dplyr [5], where you can manipulate in-memory data frames using the same API as remote databases or spark dataframes.
Next up is to move to lazy-by-default. To be as unsurprising as possible, Explorer dataframes are (for all intents and purposes) immutable. This has required a fair amount of copying when using the eager API from Polars, and because Rustler NIF resources use atomic reference counting and the GC only sweeps intermittently, there can be some pretty bad memory performance. Fortunately Polars also has a lazy API. The plan is to use that with 'peeking' for display.
After that, I'd like to move into additional backends. I'm particularly keen on Ecto (database) and Apache Arrow/Ballista for distributed and OLAP work. There is also work underway for a pure Elixir backend so the library can ship without a Rust dependency. Speaking of which, there's work on prebuilt binaries underway as well.
I'd love feedback on the API! I aimed for a dplyr-ish API as I think it melds better with a functional language than pandas. Generally I find dplyr more intuitive than pandas. The philosophy here is to get from brain to data as simply and intuitively as possible.
Finally, contributions and any other feedback are super, super welcome. It's early days and I'm also a startup founder so I haven't been able to dedicate as much time as I'd like, but I try to get some work done and add features at least once a week.
Thanks for looking!
[1] https://github.com/elixir-nx/nx