Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
tidypolars uses the polars package as a backend, which might be the fastest data frame manipulation library out there. (Faster even than R's data.table, which has been the king of speed for many years.)
I think having a basic understanding of pandas, given how broadly it's used, is beneficial. That being said, polars seems to be matching or beating data.table in performance, so I think it'd be very worth it to take it up. Wes McKinney, creator of pandas, has been quite vocal about architecture flaws of pandas -- which is why he's been working on the Arrow project. polars is based on Arrow, so in principle it's kinda like pandas 2.0 (adopting the changes that Wes proposed).
I think having a basic understanding of pandas, given how broadly it's used, is beneficial. That being said, polars seems to be matching or beating data.table in performance, so I think it'd be very worth it to take it up. Wes McKinney, creator of pandas, has been quite vocal about architecture flaws of pandas -- which is why he's been working on the Arrow project. polars is based on Arrow, so in principle it's kinda like pandas 2.0 (adopting the changes that Wes proposed).
What's cool about this (and /u/GoodAboutHood's other package tidytable) is that they adopt the widely used Tidyverse syntax for high-performance packages without sacrificing speed (and, in my opinion of dtplyr, making it too complicated).