Rust data-engineering

Open-source Rust projects categorized as data-engineering

Top 11 Rust data-engineering Projects

  • risingwave

    Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.

  • Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30

    How does this compare to RisingWave and Materialize?

    https://github.com/risingwavelabs/risingwave

  • paradedb

    Postgres for Search and Analytics

  • Project mention: Using ClickHouse to scale an events engine | news.ycombinator.com | 2024-04-11
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • qsv

    CSVs sliced, diced & analyzed.

  • Project mention: Qsv: Efficient CSV CLI Toolkit | news.ycombinator.com | 2023-12-22

    Thanks for the detailed feedback @snidane!

    As maintainer of qsv, here's my reply:

    - Given qsv's rapid release cycle (173 releases over three years), the auto-update check is essential at the moment. Once we reach 1.0, I'll turn it off. For now, given your feedback, I've only made it check 10% of the time.

    - Pivot is in the backlog and I'll be sure to add unpivot when I implement it. (https://github.com/jqnatividad/qsv/issues/799)

    - I'll add a dedicated summing command with the group by (-by) and window by (-over) capability (https://github.com/jqnatividad/qsv/issues/1514). Do note that `stats` has basic sum as @ezequiel-garzon pointed out.

    - With the `enum` command, qsv can achieve what you proposed with `laminate`. E.g. qsv enum --new-column newcol --constant newconstant mydata.csv --output laminated-data.csv

    - With the cat rowskey command, qsv can already concatenate files with mismatched headers.

    - other file formats. qsv supports parquet, csv, tsv, excel, ods, datapackage, sqlite and more (see https://github.com/jqnatividad/qsv/tree/master#file-formats). Fixed-format though is not supported yet and quite interesting, and have added it to the backlog (https://github.com/jqnatividad/qsv/issues/1515)

    - as to "enable embedding outputs of commands", qsv is composable by design, so you can use standard stdin/stdout redirection/piping techniques to have it work with other CLI tools like jq, awk, etc.

    Finally, just released v0.120.0 that already incorporates the less aggressive self-update check. https://github.com/jqnatividad/qsv/releases/tag/0.120.0

  • Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

  • Project mention: Daft: Distributed DataFrame for Python | news.ycombinator.com | 2024-02-29

    There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.

  • blaze

    Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)

  • Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19
  • delta-sharing-rs

    A Minimalistic Rust Implementation of Delta Sharing Server.

  • grant-rs

    Manage Redshift/Postgres privileges in GitOps style written in Rust

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • xvc

    A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

  • pipebase

    data integration framework

  • ansilo

    Unlocking the power of SQL/MED to create data ecosystems from disparate data sources

  • pipebuilder

    pipebase app CI

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Rust data-engineering related posts

Index

What are some of the best open-source data-engineering projects in Rust? This list will help you:

Project Stars
1 risingwave 6,283
2 paradedb 3,803
3 qsv 2,214
4 Daft 1,666
5 blaze 883
6 delta-sharing-rs 70
7 grant-rs 24
8 xvc 22
9 pipebase 9
10 ansilo 4
11 pipebuilder 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com