|8 months ago||about 18 hours ago|
|MIT License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Replacements for existing software written in Rust
1 project | reddit.com/r/patient_hackernews | 28 May 20211 project | reddit.com/r/hackernews | 28 May 20218 projects | news.ycombinator.com | 28 May 2021
Awesome Rewrite It In Rust - A curated list of replacements for existing software written in Rust
2 projects | reddit.com/r/commandline | 27 May 202162 projects | reddit.com/r/rust | 27 May 2021
[For all contributors] Do you think I should change the repository name?https://github.com/TaKO8Ki/awesome-rewrite-it-in-rust/issues/2962 projects | reddit.com/r/rust | 27 May 2021
How to use multiple Parquet files with Datafusion dataframe?
1 project | reddit.com/r/rust | 9 Jan 2022
Distributed systems you'd like to see in Rust?
8 projects | reddit.com/r/rust | 28 Dec 2021
This project looks cool: https://github.com/apache/arrow-datafusion
Any role that Rust could have in the Data world (Big Data, Data Science, Machine learning, etc.)?
8 projects | reddit.com/r/rust | 4 Dec 2021
Show HN: Box – Data Transformation Pipelines in Rust DataFusion
4 projects | news.ycombinator.com | 30 Nov 2021
A while ago I posted a link to [Arc](https://news.ycombinator.com/item?id=26573930) a declarative method for defining repeatable data pipelines which execute against [Apache Spark](https://spark.apache.org/).
Today I would like to present a proof-of-concept implementation of the [Arc declarative ETL framework](https://arc.tripl.ai) against [Apache Datafusion](https://arrow.apache.org/datafusion/) which is an Ansi SQL (Postgres) execution engine based upon Apache Arrow and built with Rust.
The idea of providing a declarative 'configuration' language for defining data pipelines was planned from the beginning of the Arc project to allow changing execution engines without having to rewrite the base business logic (the part that is valuable to your business). Instead, by defining an abstraction layer, we can change the execution engine and run the same logic with different execution characteristics.
The benefit of the DataFusion over Apache Spark is a significant increase in speed and reduction in execution resource requirements. Even through a Docker-for-Mac inefficiency layer the same job completes in ~4 seconds with DataFusion vs ~24 seconds with Apache Spark (including JVM startup time). Without Docker-for-Mac layer end-to-end execution times of 0.5 second for the same example job (TPC-H) is possible. * the aim is not to start a benchmarking flamewar but to provide some indicative data *.
The purpose of this post is to gather feedback from the community whether you would use a tool like this, what features would be required for you to use it (MVP) or whether you would be interested in contributing to the project. I would also like to highlight the excellent work being done by the DataFusion/Arrow (and Apache) community for providing such amazing tools to us all as open source projects.
Rust and what it needs to gain space in computation-oriented applications
7 projects | reddit.com/r/rust | 24 Nov 2021
You should check out polars, datafusion, influxdb iox and databend, all written in native Rust and powered by the Apache Arrow format. Polars in particular is pretty dam fast and has bindings for Python.
How to pass dataframes between Rust and Python?
4 projects | reddit.com/r/rust | 20 Nov 2021
A solution for either Polars or Datafusion (or something else?) would be fine. For both libraries, python packages exist, that contain the python bindings: https://github.com/pola-rs/polars/tree/master/py-polars https://github.com/apache/arrow-datafusion/tree/master/python
Using an ECS as a general-purpose storage container?
1 project | reddit.com/r/rust_gamedev | 2 Nov 2021
Datafusion runs SQL queries against an in-memory column store. It aims for a subset of Postgres SQL. It specifically targets big data use cases, and can integrate with other big-data tools via a 'parquet' file format.
Rrow Datafusion includes Ballista, which does SIMD and GPU vectorized ops
1 project | news.ycombinator.com | 24 Oct 2021
Apache Arrow DataFusion (Rust query engine) now has an online user guide
1 project | reddit.com/r/rust | 22 Sep 2021
Show HN: Columnq brings OLAP to Unix pipes
2 projects | news.ycombinator.com | 13 Sep 2021
Thanks! It's using Datafusion as the query engine: https://github.com/apache/arrow-datafusion
What are some alternatives?
ClickHouse - ClickHouse® is a free analytics DBMS for big data
polars - Fast multi-threaded DataFrame library in Rust | Python | Node.js
db-benchmark - reproducible benchmark of database-like ops
tikv - Distributed transactional key-value database, originally created to complement TiDB
datafuse - An elastic and reliable Cloud Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy [Moved to: https://github.com/datafuselabs/databend]
ripgrep - ripgrep recursively searches directories for a regex pattern while respecting your gitignore
nushell - A new type of shell
volta - Volta: JS Toolchains as Code. ⚡
sea-query - 🌊 A dynamic SQL query builder for MySQL, Postgres and SQLite
aws-sdk-rust - AWS SDK for the Rust Programming Language
amp - A complete text editor for your terminal.
arrow-rs - Official Rust implementation of Apache Arrow