polars
vaex
Our great sponsors
polars | vaex | |
---|---|---|
52 | 6 | |
5,869 | 7,067 | |
14.6% | 1.4% | |
9.9 | 9.5 | |
3 days ago | 16 days ago | |
Rust | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
polars
- Anda para aqui alguém a brincar com Rust (linguagem)?
-
Will pandas eventually become more intuitive?
polars might click better. Once you learn the expression API, you'll see that googling is much less needed as you can extrapolate that logic.
- Modern Python Performance Considerations
-
Ask HN: Have we screwed ourselves as software engineers?
The new data tools I've seen are complex under the hood, but offer elegant user experiences, giving the best of both worlds.
You referenced a 500 line Python script being refactored with Rust and make me think of the Polars project: https://github.com/pola-rs/polars
Polars uses Rust to make DataFrame operations lightning fast. But you don't need to use Rust to use Polars. Just use the Polars Python API and you have an elegant way to scale on a single machine and perform analyses way faster.
I'm working on Dask and our end goal is the same. We want to provide users with syntax they're familiar with to scale their analyses locally & to clusters in the cloud. We also want to provide flexibility so users can provide highly custom analyses. Highly custom analyses are complex by nature, so these aren't "easy codebases" by any means, but Dask Futures / Dask Delayed makes the distributed cluster multiprocessing part a lot easier.
Anyways, I've just seen the data industry moving towards better & better tools. Delta Lake abstracting all the complications of maintaining all the complications of plain vanilla Parquet lakes is another example of the amazing tooling. Now the analyses and models... those seem to be getting more complicated.
-
Modern Pandas (Part 2): Method Chaining
I'd recommend checking out polars as an alternative to pandas - https://github.com/pola-rs/polars
It has a rather different api, and is significantly faster. Highly recommend it.
-
Hi! We are Dr. Amanda Martin and JJ Brosnan, Developer and Python data scientist at Deephaven. Ask us anything about getting started in the data science industry, working with large data sets, and working with streaming data in Python.
Have you looked at Polars? It's a new dataframe library that has an api that makes a lot more sense than pandas, and on top of that is much, much faster.
-
Robyn - A Python web framework with a Rust runtime - crossed 200k installs on PyPi
Polars is almost at 500k downloads, this is great to see the Rust ecosystem connecting with Python.
- Polars - Fast multi-threaded DataFrame library in Rust | Python | Node.js
- Polars 0.20.0 release
-
How was the polars python API designed and written?
Are there any articles about how the authors of polars used pyo3 to expose a python API?
vaex
-
High performance (for the consumer) time series storage?
I'd recommend QuestDB. Worked with it multiple times for different algorithmic trading needs and it didn't disappoint. If you want to load data fast, I'd recommend this Python library.
-
Python Pandas vs Dask for csv file reading
How about vaex?
- Polars: Lightning-fast DataFrame library for Rust and Python
-
For stocks, what historical data do you store and how do you store it?
You might find vaex (https://github.com/vaexio/vaex) interesting if you work with HDF5.
- I wrote one of the fastest DataFrame libraries
-
A Hybrid Apache Arrow/Numpy DataFrame with Vaex Version 4.0
My guess is that should be possible, feel free to hop onto https://github.com/vaexio/vaex/discussions !
What are some alternatives?
arrow-datafusion - Apache Arrow DataFusion and Ballista query engines
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
DataFrames.jl - In-memory tabular data in Julia
data.table - R's data.table package extends data.frame:
db-benchmark - reproducible benchmark of database-like ops
rust-csv - A CSV parser for Rust, with Serde support.
arrow2 - Unofficial transmute-free Rust library to work with the Arrow format
arrow-rs - Official Rust implementation of Apache Arrow
tidypolars - Tidy interface to polars
wgpu - Safe and portable GPU abstraction in Rust, implementing WebGPU API.
visidata - A terminal spreadsheet multitool for discovering and arranging data
evcxr