SaaSHub helps you find the best software and product alternatives Learn more →
Top 10 Rust Parquet Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
-
odbc2parquet
A command line tool to query an ODBC data source and write the result into a parquet file.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Project mention: Full-fledged APIs for slowly moving datasets without writing code | news.ycombinator.com | 2023-10-25
Thanks for the detailed feedback @snidane!
As maintainer of qsv, here's my reply:
- Given qsv's rapid release cycle (173 releases over three years), the auto-update check is essential at the moment. Once we reach 1.0, I'll turn it off. For now, given your feedback, I've only made it check 10% of the time.
- Pivot is in the backlog and I'll be sure to add unpivot when I implement it. (https://github.com/jqnatividad/qsv/issues/799)
- I'll add a dedicated summing command with the group by (-by) and window by (-over) capability (https://github.com/jqnatividad/qsv/issues/1514). Do note that `stats` has basic sum as @ezequiel-garzon pointed out.
- With the `enum` command, qsv can achieve what you proposed with `laminate`. E.g. qsv enum --new-column newcol --constant newconstant mydata.csv --output laminated-data.csv
- With the cat rowskey command, qsv can already concatenate files with mismatched headers.
- other file formats. qsv supports parquet, csv, tsv, excel, ods, datapackage, sqlite and more (see https://github.com/jqnatividad/qsv/tree/master#file-formats). Fixed-format though is not supported yet and quite interesting, and have added it to the backlog (https://github.com/jqnatividad/qsv/issues/1515)
- as to "enable embedding outputs of commands", qsv is composable by design, so you can use standard stdin/stdout redirection/piping techniques to have it work with other CLI tools like jq, awk, etc.
Finally, just released v0.120.0 that already incorporates the less aggressive self-update check. https://github.com/jqnatividad/qsv/releases/tag/0.120.0
nodejs-polars is node-specific and uses native FFI. polars can be compiled to Wasm but doesn't yet have a js API out of the box.
As for the fastest way to serialize data to Pandas data to the browser, you should use Parquet; it's the fastest to write on the Python side and read on the JS side, while also being compressed. See https://github.com/kylebarron/parquet-wasm (full disclosure, I wrote this)
I have added documentation for all supported functions here.
Rust Parquet related posts
- cryo: NEW Data - star count:778.0
- cryo: NEW Data - star count:778.0
- cryo: NEW Data - star count:778.0
- cryo: NEW Data - star count:778.0
- cryo: NEW Data - star count:778.0
- cryo: NEW Data - star count:778.0
- Summing columns in remote Parquet files using DuckDB
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Apr 2024
Index
What are some of the best open-source Parquet projects in Rust? This list will help you:
Project | Stars | |
---|---|---|
1 | roapi | 3,069 |
2 | qsv | 2,203 |
3 | cryo | 967 |
4 | parquet2 | 347 |
5 | pqrs | 245 |
6 | parquet-wasm | 223 |
7 | odbc2parquet | 204 |
8 | warc-parquet | 99 |
9 | dply-rs | 37 |
10 | bdt | 6 |