|4 days ago||1 day ago|
|MIT License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Anda para aqui alguém a brincar com Rust (linguagem)?
4 projects | reddit.com/r/devpt | 9 May 2022
Will pandas eventually become more intuitive?
2 projects | reddit.com/r/Python | 9 May 2022
polars might click better. Once you learn the expression API, you'll see that googling is much less needed as you can extrapolate that logic.
Modern Python Performance Considerations
8 projects | news.ycombinator.com | 5 May 2022
Ask HN: Have we screwed ourselves as software engineers?
3 projects | news.ycombinator.com | 4 May 2022
The new data tools I've seen are complex under the hood, but offer elegant user experiences, giving the best of both worlds.
You referenced a 500 line Python script being refactored with Rust and make me think of the Polars project: https://github.com/pola-rs/polars
Polars uses Rust to make DataFrame operations lightning fast. But you don't need to use Rust to use Polars. Just use the Polars Python API and you have an elegant way to scale on a single machine and perform analyses way faster.
I'm working on Dask and our end goal is the same. We want to provide users with syntax they're familiar with to scale their analyses locally & to clusters in the cloud. We also want to provide flexibility so users can provide highly custom analyses. Highly custom analyses are complex by nature, so these aren't "easy codebases" by any means, but Dask Futures / Dask Delayed makes the distributed cluster multiprocessing part a lot easier.
Anyways, I've just seen the data industry moving towards better & better tools. Delta Lake abstracting all the complications of maintaining all the complications of plain vanilla Parquet lakes is another example of the amazing tooling. Now the analyses and models... those seem to be getting more complicated.
Modern Pandas (Part 2): Method Chaining
5 projects | news.ycombinator.com | 1 May 2022
I'd recommend checking out polars as an alternative to pandas - https://github.com/pola-rs/polars
It has a rather different api, and is significantly faster. Highly recommend it.
Hi! We are Dr. Amanda Martin and JJ Brosnan, Developer and Python data scientist at Deephaven. Ask us anything about getting started in the data science industry, working with large data sets, and working with streaming data in Python.
8 projects | reddit.com/r/IAmA | 27 Apr 2022
Have you looked at Polars? It's a new dataframe library that has an api that makes a lot more sense than pandas, and on top of that is much, much faster.
Robyn - A Python web framework with a Rust runtime - crossed 200k installs on PyPi
3 projects | reddit.com/r/Python | 26 Apr 2022
Polars is almost at 500k downloads, this is great to see the Rust ecosystem connecting with Python.
Polars - Fast multi-threaded DataFrame library in Rust | Python | Node.js
1 project | reddit.com/r/github_trends | 22 Apr 2022
Polars 0.20.0 release
2 projects | reddit.com/r/rust | 14 Mar 2022
How was the polars python API designed and written?
1 project | reddit.com/r/rust | 19 Feb 2022
Are there any articles about how the authors of polars used pyo3 to expose a python API?
Arrow-Rs - Official Rust implementation of Apache Arrow
1 project | reddit.com/r/github_trends | 4 May 2022
Apache Arrow Feature Parity Timeline?
2 projects | reddit.com/r/rust | 21 Feb 2022
That matrix doesn't seem up to date. For example looking at the rust crate it does seem to support things like map, float16, and IPC. The changelog shows an impressive development pace.
Apache Arrow Flight SQL: Accelerating Database Access
5 projects | news.ycombinator.com | 16 Feb 2022
Oh, and for anyone interested in pitching in on the Rust implementation, there's an issue logged here along with some discussion: https://github.com/apache/arrow-rs/issues/1323
February 2022 Rust Apache Arrow and Parquet Highlights
1 project | reddit.com/r/rust | 15 Feb 2022
There is more discussion about the decision here: https://github.com/apache/arrow-rs/issues/1120
Arrow2 0.9 has been released
6 projects | reddit.com/r/rust | 14 Jan 2022
Deeply nested parquet support would be nice. Even the official implementation lack this. https://github.com/apache/arrow-rs/issues/9936 projects | reddit.com/r/rust | 14 Jan 2022
I'm still not sure how this differs from https://github.com/apache/arrow-rs. What does transmute even mean?
A SQL Database
3 projects | reddit.com/r/rust | 31 Jul 2021
Cool! You may want to take a look at the Apache Arrow, rust project and datafusion: https://github.com/apache/arrow-rs and https://github.com/apache/arrow-datafusion
Nushell 0.34 released - the first release with dataframe support
3 projects | reddit.com/r/rust | 14 Jul 2021
Congrats team and great work @elferherrera! Note that this backed by Polars and Arrow, and is as fast as it gets. :)
Apache Arrow 4.0.0 Release
5 projects | news.ycombinator.com | 5 May 2021
Official Rust Implementation of Apache Arrow
1 project | news.ycombinator.com | 18 Apr 2021
What are some alternatives?
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
arrow-datafusion - Apache Arrow DataFusion and Ballista query engines
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
DataFrames.jl - In-memory tabular data in Julia
db-benchmark - reproducible benchmark of database-like ops
rust-csv - A CSV parser for Rust, with Serde support.
arrow2 - Unofficial transmute-free Rust library to work with the Arrow format
tidypolars - Tidy interface to polars
wgpu - Safe and portable GPU abstraction in Rust, implementing WebGPU API.
databend - A modern Elasticity and Performance cloud data warehouse, activate your object storage for real-time analytics.