polars
modin
Our great sponsors
polars | modin | |
---|---|---|
144 | 11 | |
25,298 | 9,408 | |
5.7% | 1.4% | |
10.0 | 9.6 | |
5 days ago | 7 days ago | |
Rust | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
polars
-
Polars
- handling of categoricals in polars seemed a little underbaked, though my main complaint, that categories cannot be pre-defined, seems to have been recently addressed: https://github.com/pola-rs/polars/issues/10705
-
Stuff I Learned during Hanukkah of Data 2023
That turned out to be related to pola-rs/polars#11912, and this linked comment provided a deceptively simple solution - use PARSE_DECLTYPES when creating the connection:
- Segunda linguagem
-
Summing columns in remote Parquet files using DuckDB
Looks like somebody requested it after reading your TIL. https://github.com/pola-rs/polars/issues/12493#issuecomment-...
It will be in the next release. (later today?)
-
What are you rewriting in rust?
I am a maintainer for a dataframe interface called polars
-
[Crowdsourcing] Is there any code you really wished used named function arguments?
For example with polars, the python library extensively uses named arguments, but in rust we have to use either a builder pattern or macros. The builder pattern tends to be much more verbose than the named argument equivalent. There is currently a draft PR implementing python style named arguments for some of the most common functions.
- Polars cookbook (Jupyter)
-
Working with Rust
Seeing a lot of great libraries coming out with python bindings in the data world e.g delta-rs Polars. I see it growing in this space as a C++ alternative
modin
- The Distributed Tensor Algebra Compiler (2022)
-
A Polars exploration into Kedro
The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.
-
Modern Polars: an extensive side-by-side comparison of Polars and Pandas
Yeah, tried Polars a couple of times: the API seems worse than Pandas to me too. eg the decision only to support autoincrementing integer indexes seems like it would make debugging "hmmm, that answer is wrong, what exactly did I select?" bugs much more annoying. Polars docs write "blazingly fast" all over them but I doubt that is a compelling point for people using single-node dataframe libraries. It isn't for me.
Modin (https://github.com/modin-project/modin) seems more promising at this point, particularly since a migration path for standing Pandas code is highly desirable.
-
Working with more than 10gb csv
Modin should fit. It implements Pandas APIs with e.g. Ray as backend. https://github.com/modin-project/modin
- Modern Python Performance Considerations
-
How to Speed Up Pandas with 1 Line of Code
The pandas library provides easy-to-use data structures like pandas DataFrames as well as tools for data analysis. One issue with pandas is that it can be slow with large amounts of data. It wasn’t designed for analyzing 100 GB or 1 TB datasets. Fortunately, there is the Modin library which has benefits like the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters
What are some alternatives?
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
arrow-datafusion - Apache Arrow DataFusion SQL Query Engine
DataFrames.jl - In-memory tabular data in Julia
datatable - A Python package for manipulating 2-dimensional tabular data structures
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
db-benchmark - reproducible benchmark of database-like ops
rust-numpy - PyO3-based Rust bindings of the NumPy C-API
hdf5-rust - HDF5 for Rust
tidypolars - Tidy interface to polars
swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
arrow2 - Transmute-free Rust library to work with the Arrow format
rust-csv - A CSV parser for Rust, with Serde support.