Top 11 Rust data-engineering Projects

risingwave

27 6,283 10.0 Rust

Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.

Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30

How does this compare to RisingWave and Materialize?
https://github.com/risingwavelabs/risingwave

paradedb

16 3,803 9.8 Rust

Postgres for Search and Analytics

Project mention: Using ClickHouse to scale an events engine | news.ycombinator.com | 2024-04-11

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
qsv

13 2,214 9.9 Rust

CSVs sliced, diced & analyzed.

Project mention: Qsv: Efficient CSV CLI Toolkit | news.ycombinator.com | 2023-12-22

Thanks for the detailed feedback @snidane!
As maintainer of qsv, here's my reply:
- Given qsv's rapid release cycle (173 releases over three years), the auto-update check is essential at the moment. Once we reach 1.0, I'll turn it off. For now, given your feedback, I've only made it check 10% of the time.
- Pivot is in the backlog and I'll be sure to add unpivot when I implement it. (https://github.com/jqnatividad/qsv/issues/799)
- I'll add a dedicated summing command with the group by (-by) and window by (-over) capability (https://github.com/jqnatividad/qsv/issues/1514). Do note that `stats` has basic sum as @ezequiel-garzon pointed out.
- With the `enum` command, qsv can achieve what you proposed with `laminate`. E.g. qsv enum --new-column newcol --constant newconstant mydata.csv --output laminated-data.csv
- With the cat rowskey command, qsv can already concatenate files with mismatched headers.
- other file formats. qsv supports parquet, csv, tsv, excel, ods, datapackage, sqlite and more (see https://github.com/jqnatividad/qsv/tree/master#file-formats). Fixed-format though is not supported yet and quite interesting, and have added it to the backlog (https://github.com/jqnatividad/qsv/issues/1515)
- as to "enable embedding outputs of commands", qsv is composable by design, so you can use standard stdin/stdout redirection/piping techniques to have it work with other CLI tools like jq, awk, etc.
Finally, just released v0.120.0 that already incorporates the less aggressive self-update check. https://github.com/jqnatividad/qsv/releases/tag/0.120.0

Daft

7 1,666 9.8 Rust

Distributed DataFrame for Python designed for the cloud, powered by Rust

Project mention: Daft: Distributed DataFrame for Python | news.ycombinator.com | 2024-02-29

There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.

blaze

8 883 9.3 Rust

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)

Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19

delta-sharing-rs

2 70 7.1 Rust

A Minimalistic Rust Implementation of Delta Sharing Server.
grant-rs

1 24 1.7 Rust

Manage Redshift/Postgres privileges in GitOps style written in Rust
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
xvc

3 22 7.7 Rust

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
pipebase

6 9 0.0 Rust

data integration framework
ansilo

1 4 10.0 Rust

Unlocking the power of SQL/MED to create data ecosystems from disparate data sources
pipebuilder

2 1 0.0 Rust

pipebase app CI

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).