polars
arrow-rs
Our great sponsors
polars | arrow-rs | |
---|---|---|
52 | 10 | |
5,869 | 877 | |
14.6% | 6.2% | |
9.9 | 9.7 | |
4 days ago | 1 day ago | |
Rust | Rust | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
polars
- Anda para aqui alguém a brincar com Rust (linguagem)?
-
Will pandas eventually become more intuitive?
polars might click better. Once you learn the expression API, you'll see that googling is much less needed as you can extrapolate that logic.
- Modern Python Performance Considerations
-
Ask HN: Have we screwed ourselves as software engineers?
The new data tools I've seen are complex under the hood, but offer elegant user experiences, giving the best of both worlds.
You referenced a 500 line Python script being refactored with Rust and make me think of the Polars project: https://github.com/pola-rs/polars
Polars uses Rust to make DataFrame operations lightning fast. But you don't need to use Rust to use Polars. Just use the Polars Python API and you have an elegant way to scale on a single machine and perform analyses way faster.
I'm working on Dask and our end goal is the same. We want to provide users with syntax they're familiar with to scale their analyses locally & to clusters in the cloud. We also want to provide flexibility so users can provide highly custom analyses. Highly custom analyses are complex by nature, so these aren't "easy codebases" by any means, but Dask Futures / Dask Delayed makes the distributed cluster multiprocessing part a lot easier.
Anyways, I've just seen the data industry moving towards better & better tools. Delta Lake abstracting all the complications of maintaining all the complications of plain vanilla Parquet lakes is another example of the amazing tooling. Now the analyses and models... those seem to be getting more complicated.
-
Modern Pandas (Part 2): Method Chaining
I'd recommend checking out polars as an alternative to pandas - https://github.com/pola-rs/polars
It has a rather different api, and is significantly faster. Highly recommend it.
-
Hi! We are Dr. Amanda Martin and JJ Brosnan, Developer and Python data scientist at Deephaven. Ask us anything about getting started in the data science industry, working with large data sets, and working with streaming data in Python.
Have you looked at Polars? It's a new dataframe library that has an api that makes a lot more sense than pandas, and on top of that is much, much faster.
-
Robyn - A Python web framework with a Rust runtime - crossed 200k installs on PyPi
Polars is almost at 500k downloads, this is great to see the Rust ecosystem connecting with Python.
- Polars - Fast multi-threaded DataFrame library in Rust | Python | Node.js
- Polars 0.20.0 release
-
How was the polars python API designed and written?
Are there any articles about how the authors of polars used pyo3 to expose a python API?
arrow-rs
- Arrow-Rs - Official Rust implementation of Apache Arrow
-
Apache Arrow Feature Parity Timeline?
That matrix doesn't seem up to date. For example looking at the rust crate it does seem to support things like map, float16, and IPC. The changelog shows an impressive development pace.
-
Apache Arrow Flight SQL: Accelerating Database Access
Oh, and for anyone interested in pitching in on the Rust implementation, there's an issue logged here along with some discussion: https://github.com/apache/arrow-rs/issues/1323
-
February 2022 Rust Apache Arrow and Parquet Highlights
There is more discussion about the decision here: https://github.com/apache/arrow-rs/issues/1120
-
Arrow2 0.9 has been released
Deeply nested parquet support would be nice. Even the official implementation lack this. https://github.com/apache/arrow-rs/issues/993
I'm still not sure how this differs from https://github.com/apache/arrow-rs. What does transmute even mean?
-
A SQL Database
Cool! You may want to take a look at the Apache Arrow, rust project and datafusion: https://github.com/apache/arrow-rs and https://github.com/apache/arrow-datafusion
-
Nushell 0.34 released - the first release with dataframe support
Congrats team and great work @elferherrera! Note that this backed by Polars and Arrow, and is as fast as it gets. :)
- Apache Arrow 4.0.0 Release
- Official Rust Implementation of Apache Arrow
What are some alternatives?
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
arrow-datafusion - Apache Arrow DataFusion and Ballista query engines
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
DataFrames.jl - In-memory tabular data in Julia
db-benchmark - reproducible benchmark of database-like ops
rust-csv - A CSV parser for Rust, with Serde support.
arrow2 - Unofficial transmute-free Rust library to work with the Arrow format
tidypolars - Tidy interface to polars
wgpu - Safe and portable GPU abstraction in Rust, implementing WebGPU API.
evcxr
databend - A modern Elasticity and Performance cloud data warehouse, activate your object storage for real-time analytics.