bacon
datafusion
bacon | datafusion | |
---|---|---|
2 | 59 | |
183 | 5,862 | |
- | 3.8% | |
5.8 | 10.0 | |
6 months ago | 6 days ago | |
Rust | Rust | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
bacon
- Any role that Rust could have in the Data world (Big Data, Data Science, Machine learning, etc.)?
-
Scientific Computing in Rust
See the github repo here https://github.com/aftix/bacon
datafusion
-
How to build a new Harlequin adapter with Poetry
Harlequin is a TUI client for SQL databases known for its light-weight extensive support for SQL databases. It is a versatile tool for data exploration and analysis workflows. Harlequin provides an interactive SQL editor with features like autocomplete, syntax highlighting, and query history. It also has a results viewer that can display large result sets. However, Harlequin did not have a DataFusion adapter before. Thankfully, it was really easy to add one.
-
Why you should keep an eye on Apache DataFusion and its community.
In case you don't know what Apache DataFusion is, here's the high-level blurb.
-
Make Rust Object Oriented with the dual-trait pattern
I've invented ๐ this dual-trait pattern for the purposes of the logical planner, as seen in this merged PR. The problem was that the nodes in the plan (filter, select, etc.) had to support at the same time:
- Pg_lakehouse: A DuckDB Alternative in Postgres
-
Velox: Meta's Unified Execution Engine [pdf]
Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait
We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441
-
What I Talk About When I Talk About Query Optimizer (Part 1): IR Design
Agree, substrait is a really cool project! Related: if you like substrait you might want to check out datafusion too. The project is a query execution engine built on top of Apache Arrow (with SQL parser, query planner & optimizer, execution engine, extensible user defined functions, among others) and it implements a substrait provider and consumer: https://github.com/apache/arrow-datafusion/tree/main/datafus...
-
DuckDB performance improvements with the latest release
The draft contains some preliminary benchmark results, comparing it to DuckDB.
https://github.com/apache/arrow-datafusion/issues/6782
- Apache Arrow DataFusion
-
GlareDB: An open source SQL database to query and analyze distributed data
Apache Arrow is a pretty common memory structure these days. Datafusion is an open query engine built in Rust started by Andy Grove.
-
DuckDB 0.8.0
DuckDB is a great piece of software if you are
If you are looking for a query engine implemented in a safe language (Rust) I definitely suggest checking out DataFusion. It is comparable to DuckDB in performance, has all the standard built in SQL functionality, and is extensible in pretty much all areas (query language, data formats, catalogs, user defined functions, etc)
https://arrow.apache.org/datafusion/
Disclaimer I am a maintainer of DataFusion
What are some alternatives?
rink-rs - Unit conversion tool and library written in rust
polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust
linfa - A Rust machine learning framework.
ClickHouse - ClickHouseยฎ is a real-time analytics DBMS
statrs - Statistical computation library for Rust
db-benchmark - reproducible benchmark of database-like ops
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
databend - ๐๐ฎ๐๐ฎ, ๐๐ป๐ฎ๐น๐๐๐ถ๐ฐ๐ & ๐๐. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
neuronika - Tensors and dynamic neural networks in pure Rust.
DuckDB - DuckDB is an analytical in-process SQL database management system
tangram - Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.
nushell - A new type of shell