I wrote one of the fastest DataFrame libraries

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gsir-te

1 230 0.0 R

Getting Started in R -- Tinyverse Edition

I dropped dplyr in favor of data.table and never looked back.
https://github.com/eddelbuettel/gsir-te

ballista

20 2,238 9.3 Rust

Discontinued Distributed compute platform implemented in Rust, and powered by Apache Arrow.

I'm guessing Polars and Ballista (https://github.com/ballista-compute/ballista) have different goals, but I don't know enough about either to say what those might be. Does anyone know enough about either to explain the differences?

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
TypedTables.jl

2 143 5.2 Julia

Simple, fast, column-based storage for data analysis in Julia

Not that I am a heavy DataFrame user, but I have felt more at home with the comparatively light-weight TypeTables [1]. My understanding is that the rather complicated DataFrame ecosystem in Julia [2] mostly stems from whether tables should be immutable and/or typed. As far as I am aware there has not been any major push at the compiler level to speed up untyped code yet – although there should be plenty of room for improvements – which I suspect would benefit DataFrames greatly.
[1]: https://github.com/JuliaData/TypedTables.jl
[2]: https://typedtables.juliadata.org/stable/man/table/#datafram...

rust-dataframe

1 287 0.8 Rust

Discontinued A Rust DataFrame implementation, built on Apache Arrow

>Rust DataFrame implementation, built on Apache Arrow
https://github.com/nevi-me/rust-dataframe
A bit less mature/feature-complete than polars last time I looked. Does not seem to do anything with on-disk spillover from what I can see. But if you wanted to use Arrow to do that, nevi-me's crate may be a good place to start.

vaex

7 8,171 5.4 Python

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
data.table

16 3,478 9.4 R

R's data.table package extends data.frame:

data.table is basically a highly optimized C library
https://github.com/Rdatatable/data.table

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Polars
11 projects | news.ycombinator.com | 8 Jan 2024
Any job processing framework like Spark but in Rust?
4 projects | /r/dataengineering | 23 Mar 2023
Pure Python Distributed SQL Engine
9 projects | news.ycombinator.com | 30 Dec 2022
Scala or Rust? which one will rule in future?
4 projects | /r/dataengineering | 23 Dec 2022
Welcome to Comprehensive Rust
10 projects | news.ycombinator.com | 22 Dec 2022

I wrote one of the fastest DataFrame libraries

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Dataframe Rust Python Arrow Bigdata
Post date: 13 Mar 2021

gsir-te

ballista

WorkOS

TypedTables.jl

rust-dataframe

vaex

data.table

Related posts

I wrote one of the fastest DataFrame libraries

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Dataframe Rust Python Arrow Bigdata Post date: 13 Mar 2021

gsir-te

ballista

WorkOS

TypedTables.jl

rust-dataframe

vaex

data.table

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Dataframe Rust Python Arrow Bigdata
Post date: 13 Mar 2021