Modern Polars: an extensive side-by-side comparison of Polars and Pandas

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • InfluxDB - Access the most powerful time series database as a service
  • SaaSHub - Software Alternatives and Reviews
  • redframes

    General Purpose Data Manipulation Library

    I'm not GP, but I find the pandas API incredibly inconsistent and difficult to remember how to do simple transformations. For example, it sometimes overloads operators because it doesn't use built in language features like lambdas. There are reasons for the inconsistency, but using the alternatives like R's tidyverse or Julia's DataFramess.jl is like night and day for me.

    I found RedFrames [1] recently which wraps Pandas dataframes with a more consistent interface, it's probably what I'd use if I had to write data transformations that had to be compatible with Pandas.

    [1] https://github.com/maxhumber/redframes

  • dplyr

    dplyr: A grammar of data manipulation

    It really can't be said enough how pandas is a mess. It has way too much surface area and no common thread pulling it all together. This gets obvious when you work with better dataframe libs like dplyr [1] or DataFramesMeta [2]. I've worked on production systems with all of these libs, this is not gratuitous bashing.

    [1] https://dplyr.tidyverse.org/

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

    Yeah, tried Polars a couple of times: the API seems worse than Pandas to me too. eg the decision only to support autoincrementing integer indexes seems like it would make debugging "hmmm, that answer is wrong, what exactly did I select?" bugs much more annoying. Polars docs write "blazingly fast" all over them but I doubt that is a compelling point for people using single-node dataframe libraries. It isn't for me.

    Modin (https://github.com/modin-project/modin) seems more promising at this point, particularly since a migration path for standing Pandas code is highly desirable.

  • pandoc

    Universal markup converter

    Not the author but it seems that the site was made using Quarto [1] which uses pandoc [2] behind the scenes for producing the final output. The pandoc website suggests EPUB is possible.

    [1] https://quarto.org/docs/get-started/authoring/text-editor.ht...

    [2] https://pandoc.org/

  • tidypolars

    Tidy interface to polars

    There’s a tidypolars package that appears to be well-maintained https://github.com/markfairbanks/tidypolars

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts