Pandas v2.0 Released

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Pandas

393 41,863 10.0 Python

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Your link is broken for me, but going to their website and clicking on the 2.0 what's new link takes me to the same URL. They might be updating it... the closest I found was the Sphinx docs source for that: https://github.com/pandas-dev/pandas/blob/main/doc/source/wh...

jupysql

8 591 9.3 Python

Better SQL in Jupyter. 📊

How are people managing the existence of data frame APIs like pandas/polars with SQL engines like BigQuery, Snowflake, and DuckDB?
Most of my notebooks are a mix of SQL and Python: SQL for most processing, dump the results as a pandas dataframe (via https://github.com/ploomber/jupysql) and then use Python for operations that are difficult to express with SQL (or that I don't know how to do it), so I end up with 80% SQL, 20% Python.
Unsure if this is the best workflow but it's the most efficient one I've come up with.
Disclaimer: my team develops JupySQL.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
tpch

5 56 7.5 C

Polars author here. I have run the TPC-H benchmark against polars and pandas 2.0 backed by arrow types.
https://github.com/pola-rs/tpch/pull/36
Pandas having arrow as backend is great and will make interop with the arrow community (and polars) much better.
However, if you need performance, polars remains orders of magnitudes faster on whole queries, changing to the arrow memory format does not change that.

db-benchmark

91 319 0.0 R

reproducible benchmark of database-like ops

If interested in benchmarks comparing different dataframe implementations, here is one:
https://h2oai.github.io/db-benchmark/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project