ragged-buffer
polars
ragged-buffer | polars | |
---|---|---|
2 | 144 | |
19 | 26,378 | |
- | 2.9% | |
3.8 | 10.0 | |
about 1 year ago | about 7 hours ago | |
Rust | Rust | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ragged-buffer
-
Entity Gym: A new entity based API for reinforcement learning environments
We are also releasing enn-trainer, a PPO implementation that takes full advantage of the Entity Gym interface. Variable-length observations are efficiently processed using ragged sample buffers and a general ragged batch transformer implementation that can be applied to any Entity Gym environment. With many performance optimizations still missing, enn-trainer can already reach a throughput of 10s of thousands of samples per second per GPU when it is not bottlenecked by stepping the environment. More typically, environments implemented in Python reach thousands of samples per second, but can share a single GPU between multiple concurrent training runs.
-
Writing Rust libraries for the Python scientific computing ecosystem
One of Rust's many strengths is that it can be seamlessly integrated with Python and speed up critical code sections. I recently wrote a small library with an efficient ragged array datatype, and I figured it would make for a good example of how to set up a Rust Python package with PyO3 and maturin that interoperates with numpy. There are a lot of little details that took me quite a while to figure out:
polars
-
Why Python's Integer Division Floors (2010)
This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.
I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).
-
Polars
https://github.com/pola-rs/polars/releases/tag/py-0.19.0
-
Stuff I Learned during Hanukkah of Data 2023
That turned out to be related to pola-rs/polars#11912, and this linked comment provided a deceptively simple solution - use PARSE_DECLTYPES when creating the connection:
- Polars 0.20 Released
- Segunda linguagem
- Polars: Dataframes powered by a multithreaded query engine, written in Rust
- Summing columns in remote Parquet files using DuckDB
- Polars 0.34 is released. (A query engine focussing on DataFrame front ends)
What are some alternatives?
maturin-action - GitHub Action to install and run a custom maturin command with built-in support for cross compilation
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
rogue-net - Entity Gym compatible ragged batch transformer implementation.
modin - Modin: Scale your Pandas workflows by changing a single line of code
enn-trainer - Reinforcement learning training framework for entity-gym environments.
datafusion - Apache DataFusion SQL Query Engine
entity-gym - Standard interface for entity based reinforcement learning environments.
DataFrames.jl - In-memory tabular data in Julia
PyO3 - Rust bindings for the Python interpreter
datatable - A Python package for manipulating 2-dimensional tabular data structures
enn-zoo - Collection of entity-gym bindings for different reinforcement learning environments.
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing