Pandas v2.0 Released

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • Your link is broken for me, but going to their website and clicking on the 2.0 what's new link takes me to the same URL. They might be updating it... the closest I found was the Sphinx docs source for that: https://github.com/pandas-dev/pandas/blob/main/doc/source/wh...

  • jupysql

    Better SQL in Jupyter. 📊

  • How are people managing the existence of data frame APIs like pandas/polars with SQL engines like BigQuery, Snowflake, and DuckDB?

    Most of my notebooks are a mix of SQL and Python: SQL for most processing, dump the results as a pandas dataframe (via https://github.com/ploomber/jupysql) and then use Python for operations that are difficult to express with SQL (or that I don't know how to do it), so I end up with 80% SQL, 20% Python.

    Unsure if this is the best workflow but it's the most efficient one I've come up with.

    Disclaimer: my team develops JupySQL.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • tpch

  • Polars author here. I have run the TPC-H benchmark against polars and pandas 2.0 backed by arrow types.

    https://github.com/pola-rs/tpch/pull/36

    Pandas having arrow as backend is great and will make interop with the arrow community (and polars) much better.

    However, if you need performance, polars remains orders of magnitudes faster on whole queries, changing to the arrow memory format does not change that.

  • db-benchmark

    reproducible benchmark of database-like ops

  • If interested in benchmarks comparing different dataframe implementations, here is one:

    https://h2oai.github.io/db-benchmark/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts