-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Your link is broken for me, but going to their website and clicking on the 2.0 what's new link takes me to the same URL. They might be updating it... the closest I found was the Sphinx docs source for that: https://github.com/pandas-dev/pandas/blob/main/doc/source/wh...
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
How are people managing the existence of data frame APIs like pandas/polars with SQL engines like BigQuery, Snowflake, and DuckDB?
Most of my notebooks are a mix of SQL and Python: SQL for most processing, dump the results as a pandas dataframe (via https://github.com/ploomber/jupysql) and then use Python for operations that are difficult to express with SQL (or that I don't know how to do it), so I end up with 80% SQL, 20% Python.
Unsure if this is the best workflow but it's the most efficient one I've come up with.
Disclaimer: my team develops JupySQL.
-
Polars author here. I have run the TPC-H benchmark against polars and pandas 2.0 backed by arrow types.
https://github.com/pola-rs/tpch/pull/36
Pandas having arrow as backend is great and will make interop with the arrow community (and polars) much better.
However, if you need performance, polars remains orders of magnitudes faster on whole queries, changing to the arrow memory format does not change that.
-
If interested in benchmarks comparing different dataframe implementations, here is one:
https://h2oai.github.io/db-benchmark/
Related posts
-
The Design Philosophy of Great Tables (Software Package)
-
Read files from s3 using Pandas/s3fs or AWS Data Wrangler?
-
How to Build and Deploy a Machine Learning model using Docker
-
We are the developers behind pandas, currently preparing for the 2.0 release :) AMA
-
Talking Data: What do we need for engaging data analytics?