The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 8 Rust Big Data Projects
-
risingwave
Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.
-
quickwit
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
-
blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30How does this compare to RisingWave and Materialize?
https://github.com/risingwavelabs/risingwave
Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait
We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441
There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.
sorry thats https://matano.dev
Not super on topic because this is all immature and not integrated with one another yet, but there is a scaled-out rust data-frames-on-arrow implementation called ballista that could maybe? form the backend of a polars scale out approach: https://github.com/apache/arrow-ballista
Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19
In edge computing, managing time series blob data efficiently is critical for performance-sensitive applications. This blog post will compare ReductStore, a specialized time series database for unstructured data, and MongoDB, a widely-used NoSQL database.
Rust Big Data related posts
- Velox: Meta's Unified Execution Engine [pdf]
- Apache Arrow DataFusion
- Ballista (Rust) vs Apache Spark. A Tale of Woe.
- GlareDB: An open source SQL database to query and analyze distributed data
- Evolution and Trends of Data Engineering 2022/23
- Polars: Computing a new column from multiple columns - there must be a better way
- biobear -- python package with minimal dependencies for bioinformatic file parsing and querying using rust and polars as the backend
-
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024
Index
What are some of the best open-source Big Data projects in Rust? This list will help you:
Project | Stars | |
---|---|---|
1 | risingwave | 6,283 |
2 | quickwit | 6,052 |
3 | datafusion | 5,020 |
4 | Daft | 1,666 |
5 | matano | 1,354 |
6 | datafusion-ballista | 1,275 |
7 | blaze | 883 |
8 | ReductStore | 137 |
Sponsored