I wrote one of the fastest DataFrame libraries

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • gsir-te

    Getting Started in R -- Tinyverse Edition

  • I dropped dplyr in favor of data.table and never looked back.

    https://github.com/eddelbuettel/gsir-te

  • ballista

    Discontinued Distributed compute platform implemented in Rust, and powered by Apache Arrow.

  • I'm guessing Polars and Ballista (https://github.com/ballista-compute/ballista) have different goals, but I don't know enough about either to say what those might be. Does anyone know enough about either to explain the differences?

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • TypedTables.jl

    Simple, fast, column-based storage for data analysis in Julia

  • Not that I am a heavy DataFrame user, but I have felt more at home with the comparatively light-weight TypeTables [1]. My understanding is that the rather complicated DataFrame ecosystem in Julia [2] mostly stems from whether tables should be immutable and/or typed. As far as I am aware there has not been any major push at the compiler level to speed up untyped code yet – although there should be plenty of room for improvements – which I suspect would benefit DataFrames greatly.

    [1]: https://github.com/JuliaData/TypedTables.jl

    [2]: https://typedtables.juliadata.org/stable/man/table/#datafram...

  • rust-dataframe

    Discontinued A Rust DataFrame implementation, built on Apache Arrow

  • >Rust DataFrame implementation, built on Apache Arrow

    https://github.com/nevi-me/rust-dataframe

    A bit less mature/feature-complete than polars last time I looked. Does not seem to do anything with on-disk spillover from what I can see. But if you wanted to use Arrow to do that, nevi-me's crate may be a good place to start.

  • vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

  • data.table

    R's data.table package extends data.frame:

  • data.table is basically a highly optimized C library

    https://github.com/Rdatatable/data.table

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts