Polars

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

polars

144 26,043 10.0 Rust

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

- handling of categoricals in polars seemed a little underbaked, though my main complaint, that categories cannot be pre-defined, seems to have been recently addressed: https://github.com/pola-rs/polars/issues/10705

prql

106 9,427 9.9 Rust

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

I am very curious to know how you feel about PRQL (prql-lang.org) ? IMHO it gives you the ergonomics and DX of Polars or Pandas with the power and universality of SQL because you can still execute your queries on any SQL compatible query execution engine of your choice, including Polars and Pandas but also DuckDB, ClickHouse, BigQuery, Redshift, Postgres, Trino/Presto, SQLite, ... to name just a few popular ones.
The join syntax and semantics is one of the trickiest parts and is under discussion again recently. It's actually one of the key parts of any data transformation platform and is foundational to Relational Algebra, being right there in the "Relational" part and also the R in PRQL. Most of the PRQL built-in primitive transforms are just simple list manipulations like map, filter or reduce but joins require care to preserve monadic composition (see for example the design of SelectMany in LINQ or flatmap in the List Monad). See this comment for some of my thoughts on this: https://github.com/PRQL/prql/issues/3782#issuecomment-181131...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
db-benchmark

91 319 0.0 R

reproducible benchmark of database-like ops

Real-world performance is complicated since data science covers a lot of use cases.
If you're just reading a small CSV to do analysis on it, then there will be no human-perceptible difference between Polars and Pandas. If you're reading a larger CSV with 100k rows, there still won't be much of a perceptible difference.
Per this (old) benchmark, there are differences once you get into 500MB+ territory: https://h2oai.github.io/db-benchmark/

explorer

20 976 9.4 Elixir

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

The Explorer library [0] in Elixir uses Polars underneath it.
[0] https://github.com/elixir-explorer/explorer

db-benchmark

11 120 8.0 R

reproducible benchmark of database-like ops (by duckdblabs)

DuckDB maintains a benchmark of open source database-like tools, including Polars and Pandas
https://duckdblabs.github.io/db-benchmark/

scikit-learn

81 58,046 9.9 Python

scikit-learn: machine learning in Python

sklearn is adding support through the dataframe interchange protocol (https://github.com/scikit-learn/scikit-learn/issues/25896). scipy, as far as I know, doesn't explicitly support dataframes (it just happens to work when you wrap a Series in `np.array` or `np.asarray`). I don't know about PyTorch but in general you can convert to numpy.

quivr

2 20 9.1 Python

Python library for working with Arrow data in tabular form (by B612-Asteroid-Institute)

Polars is cool, but man, I really have come to think that dataframes are disastrous for software. The mess of internal state and confusion of writing functions that take “df” and manipulate it - its all so hard to clean up once you’re deep in the mess.
Quivr (https://github.com/spenczar/quivr) is an alternative approach that has been working for me. Maybe types are good!

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
datafusion-ballista

12 1,275 8.4 Rust

Apache Arrow Ballista Distributed Query Engine

Not super on topic because this is all immature and not integrated with one another yet, but there is a scaled-out rust data-frames-on-arrow implementation called ballista that could maybe? form the backend of a polars scale out approach: https://github.com/apache/arrow-ballista

r-polars

5 387 9.8 R

Bring polars to R

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project