parquet2
polars
parquet2 | polars | |
---|---|---|
6 | 144 | |
347 | 26,218 | |
- | 2.9% | |
3.2 | 10.0 | |
8 months ago | 6 days ago | |
Rust | Rust | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
parquet2
-
Rust is showing a lot of promise in the DataFrame / tabular data space
[arrow2](https://github.com/jorgecarleitao/arrow2) and [parquet2](https://github.com/jorgecarleitao/parquet2) are great foundational libraries for and DataFrame libs in Rust.
-
::lending-iterator — Lending/streaming Iterators on Stable Rust (and a pinch of HKT)
This is so freaking life-saving! - we have been using StreamingIterator and FallibleStreamingIterator in libraries (arrow2 and parquet2) and the existing landscape is quite confusing for new users!
- Anda para aqui alguém a brincar com Rust (linguagem)?
-
Parquet2 0.9 released (and a request for feedback)
Thanks a lot for your feedback. Based on it I am proposing the following change: https://github.com/jorgecarleitao/parquet2/pull/78
-
parquet2 0.3.0, with native support to read async
release on github.
polars
-
Why Python's Integer Division Floors (2010)
This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.
I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).
-
Polars
https://github.com/pola-rs/polars/releases/tag/py-0.19.0
-
Stuff I Learned during Hanukkah of Data 2023
That turned out to be related to pola-rs/polars#11912, and this linked comment provided a deceptively simple solution - use PARSE_DECLTYPES when creating the connection:
- Polars 0.20 Released
- Segunda linguagem
- Polars: Dataframes powered by a multithreaded query engine, written in Rust
- Summing columns in remote Parquet files using DuckDB
- Polars 0.34 is released. (A query engine focussing on DataFrame front ends)
What are some alternatives?
parquet-format-rs - Apache Parquet format for Rust, hosting the Thrift definition file and the generated .rs file
vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
rust-brotli - Brotli compressor and decompressor written in rust that optionally avoids the stdlib
modin - Modin: Scale your Pandas workflows by changing a single line of code
roapi - Create full-fledged APIs for slowly moving datasets without writing a single line of code.
datafusion - Apache DataFusion SQL Query Engine
arrow2 - Transmute-free Rust library to work with the Arrow format
DataFrames.jl - In-memory tabular data in Julia
inkwell - It's a New Kind of Wrapper for Exposing LLVM (Safely)
datatable - A Python package for manipulating 2-dimensional tabular data structures
pqrs - Command line tool for inspecting Parquet files
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing