Rust is showing a lot of promise in the DataFrame / tabular data space

This page summarizes the projects mentioned and recommended in the original post on /r/rust

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • [Polars](https://github.com/pola-rs/polars) is a blazing fast DataFrame library with a beautiful user interface and an awesome getting started guide. The impressive h2o benchmark results have gotten Polars a lot of users.

  • db-benchmark

    reproducible benchmark of database-like ops

  • [Polars](https://github.com/pola-rs/polars) is a blazing fast DataFrame library with a beautiful user interface and an awesome getting started guide. The impressive h2o benchmark results have gotten Polars a lot of users.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • arrow-datafusion

    Apache DataFusion SQL Query Engine

  • [arrow-datafusion](https://github.com/apache/arrow-datafusion) is another great DataFrame library, especially if you like running SQL queries. It's so easy to query a Parquet / CSV dataset with SQL using DataFusion. I've run local benchmarks and it's super fast. The DataFusion docs are a bit lacking, which is a shame, for such a developed and amazing library. I hope to make these better and help spread the world about how truly amazing this lib is.

  • arrow2

    Discontinued Transmute-free Rust library to work with the Arrow format

  • [arrow2](https://github.com/jorgecarleitao/arrow2) and [parquet2](https://github.com/jorgecarleitao/parquet2) are great foundational libraries for and DataFrame libs in Rust.

  • parquet2

    Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

  • [arrow2](https://github.com/jorgecarleitao/arrow2) and [parquet2](https://github.com/jorgecarleitao/parquet2) are great foundational libraries for and DataFrame libs in Rust.

  • delta-rs

    A native Rust library for Delta Lake, with bindings into Python

  • I'm working on [delta-rs](https://github.com/delta-io/delta-rs) which brings the power of Delta Lake to the Rust community. CSV / Parquet lakes are limited and Delta Lakes offer a ton of advantages (versioned data, time travel, ACID transactions, schema enforcement, etc). We're working to bring full Polars and DataFusion support to delta-rs, see the roadmap.

  • influxdb_iox

    Discontinued Pronounced (influxdb eye-ox), short for iron oxide. This is the new core of InfluxDB written in Rust on top of Apache Arrow.

  • Already is: https://github.com/influxdata/influxdb_iox Just still a work in progress.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • PyO3

    Rust bindings for the Python interpreter

  • If you’re interested in python bindings take a look at https://pyo3.rs/

  • kafka-delta-ingest

    A highly efficient daemon for streaming data from Kafka into Delta Lake

  • kafka-delta-ingest is a good project to get streaming data into a Delta Lake. Here's a great talk on the topic.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts