|3 days ago||7 days ago|
|MIT License||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Anda para aqui alguém a brincar com Rust (linguagem)?
4 projects | reddit.com/r/devpt | 9 May 2022
Will pandas eventually become more intuitive?
2 projects | reddit.com/r/Python | 9 May 2022
polars might click better. Once you learn the expression API, you'll see that googling is much less needed as you can extrapolate that logic.
Modern Python Performance Considerations
8 projects | news.ycombinator.com | 5 May 2022
Ask HN: Have we screwed ourselves as software engineers?
3 projects | news.ycombinator.com | 4 May 2022
The new data tools I've seen are complex under the hood, but offer elegant user experiences, giving the best of both worlds.
You referenced a 500 line Python script being refactored with Rust and make me think of the Polars project: https://github.com/pola-rs/polars
Polars uses Rust to make DataFrame operations lightning fast. But you don't need to use Rust to use Polars. Just use the Polars Python API and you have an elegant way to scale on a single machine and perform analyses way faster.
I'm working on Dask and our end goal is the same. We want to provide users with syntax they're familiar with to scale their analyses locally & to clusters in the cloud. We also want to provide flexibility so users can provide highly custom analyses. Highly custom analyses are complex by nature, so these aren't "easy codebases" by any means, but Dask Futures / Dask Delayed makes the distributed cluster multiprocessing part a lot easier.
Anyways, I've just seen the data industry moving towards better & better tools. Delta Lake abstracting all the complications of maintaining all the complications of plain vanilla Parquet lakes is another example of the amazing tooling. Now the analyses and models... those seem to be getting more complicated.
Modern Pandas (Part 2): Method Chaining
5 projects | news.ycombinator.com | 1 May 2022
I'd recommend checking out polars as an alternative to pandas - https://github.com/pola-rs/polars
It has a rather different api, and is significantly faster. Highly recommend it.
Hi! We are Dr. Amanda Martin and JJ Brosnan, Developer and Python data scientist at Deephaven. Ask us anything about getting started in the data science industry, working with large data sets, and working with streaming data in Python.
8 projects | reddit.com/r/IAmA | 27 Apr 2022
Have you looked at Polars? It's a new dataframe library that has an api that makes a lot more sense than pandas, and on top of that is much, much faster.
Robyn - A Python web framework with a Rust runtime - crossed 200k installs on PyPi
3 projects | reddit.com/r/Python | 26 Apr 2022
Polars is almost at 500k downloads, this is great to see the Rust ecosystem connecting with Python.
Polars - Fast multi-threaded DataFrame library in Rust | Python | Node.js
1 project | reddit.com/r/github_trends | 22 Apr 2022
Polars 0.20.0 release
2 projects | reddit.com/r/rust | 14 Mar 2022
How was the polars python API designed and written?
1 project | reddit.com/r/rust | 19 Feb 2022
Are there any articles about how the authors of polars used pyo3 to expose a python API?
Automate the boring stuff with Julia?
3 projects | reddit.com/r/Julia | 27 Mar 2022
DataFrames.jl and XLSX.jl for JSON, CSV, and XLSX files
What would it take to recreate dplyr in Python?
3 projects | news.ycombinator.com | 17 Jan 2022
Dataframes.jl version 1.0: Tools for working with tabular data in Julia
1 project | news.ycombinator.com | 6 May 2021
3 projects | reddit.com/r/learnpython | 30 Apr 2021
Julia also has the CSV.jl library for reading/writing csv files, the DataFrames.jl library for manipulating data like pandas, and Images.jl for image processing/analysis. However, since Julia is so much newer than Python, the Julia libraries are almost never as feature rich as their Python counterparts.
Polars (Rust DataFrame library) join algorithm fastest in db-benchmark
2 projects | reddit.com/r/rust | 12 Mar 2021
Looks like it's single threaded according to this open issue: https://github.com/JuliaData/DataFrames.jl/issues/2626
What are some alternatives?
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
arrow-datafusion - Apache Arrow DataFusion and Ballista query engines
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
db-benchmark - reproducible benchmark of database-like ops
rust-csv - A CSV parser for Rust, with Serde support.
arrow-rs - Official Rust implementation of Apache Arrow
arrow2 - Unofficial transmute-free Rust library to work with the Arrow format
tidypolars - Tidy interface to polars
Tables.jl - An interface for tables in Julia
wgpu - Safe and portable GPU abstraction in Rust, implementing WebGPU API.
databend - A modern Elasticity and Performance cloud data warehouse, activate your object storage for real-time analytics.