Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 11 Rust data-engineering Projects
-
risingwave
Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30How does this compare to RisingWave and Materialize?
https://github.com/risingwavelabs/risingwave
Thanks for the detailed feedback @snidane!
As maintainer of qsv, here's my reply:
- Given qsv's rapid release cycle (173 releases over three years), the auto-update check is essential at the moment. Once we reach 1.0, I'll turn it off. For now, given your feedback, I've only made it check 10% of the time.
- Pivot is in the backlog and I'll be sure to add unpivot when I implement it. (https://github.com/jqnatividad/qsv/issues/799)
- I'll add a dedicated summing command with the group by (-by) and window by (-over) capability (https://github.com/jqnatividad/qsv/issues/1514). Do note that `stats` has basic sum as @ezequiel-garzon pointed out.
- With the `enum` command, qsv can achieve what you proposed with `laminate`. E.g. qsv enum --new-column newcol --constant newconstant mydata.csv --output laminated-data.csv
- With the cat rowskey command, qsv can already concatenate files with mismatched headers.
- other file formats. qsv supports parquet, csv, tsv, excel, ods, datapackage, sqlite and more (see https://github.com/jqnatividad/qsv/tree/master#file-formats). Fixed-format though is not supported yet and quite interesting, and have added it to the backlog (https://github.com/jqnatividad/qsv/issues/1515)
- as to "enable embedding outputs of commands", qsv is composable by design, so you can use standard stdin/stdout redirection/piping techniques to have it work with other CLI tools like jq, awk, etc.
Finally, just released v0.120.0 that already incorporates the less aggressive self-update check. https://github.com/jqnatividad/qsv/releases/tag/0.120.0
There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.
Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19
Rust data-engineering related posts
- Ask HN: Best way to mirror a Postgres database to parquet?
- Daft: Distributed DataFrame for Python
- Transforming Postgres into a Fast OLAP Database
- Show HN: Pg_analytics – Speed Up Postgres Analytical Queries by 94x
- ParadeDB – PostgreSQL for Search
- Postgresql index
- Building an open source vector database. Looking for advice.
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024
Index
What are some of the best open-source data-engineering projects in Rust? This list will help you:
Project | Stars | |
---|---|---|
1 | risingwave | 6,283 |
2 | paradedb | 3,803 |
3 | qsv | 2,214 |
4 | Daft | 1,666 |
5 | blaze | 883 |
6 | delta-sharing-rs | 70 |
7 | grant-rs | 24 |
8 | xvc | 22 |
9 | pipebase | 9 |
10 | ansilo | 4 |
11 | pipebuilder | 1 |
Sponsored