SaaSHub helps you find the best software and product alternatives Learn more →
Arrow-datafusion Alternatives
Similar projects and alternatives to arrow-datafusion
-
polars
Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
-
-
SonarLint
Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.
-
-
datafuse
An elastic and reliable Cloud Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy [Moved to: https://github.com/datafuselabs/databend]
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
-
databend
A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com
-
tikv
Distributed transactional key-value database, originally created to complement TiDB
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
-
roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
-
sea-query
🔱 A dynamic SQL query builder for MySQL, Postgres and SQLite
-
-
steampipe
Use SQL to instantly query your cloud services (AWS, Azure, GCP and more). Open source CLI. No DB required.
-
influxdb_iox
Pronounced (influxdb eye-ox), short for iron oxide. This is the new core of InfluxDB written in Rust on top of Apache Arrow.
-
-
awesome-rewrite-it-in-rust
A curated list of replacements for existing software written in Rust [Moved to: https://github.com/TaKO8Ki/awesome-alternatives-in-rust]
-
-
-
-
-
not-yet-awesome-rust
A curated list of Rust code and resources that do NOT exist yet, but would be beneficial to the Rust community.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
arrow-datafusion reviews and mentions
-
Bridging Async and Sync Rust Code - A lesson learned while working with Tokio
Problem comes when you want to do this inside an async context since we couldn't block an async task. https://users.rust-lang.org/t/sync-function-invoking-async/43364/6 You might need to do it in another runtime/thread. It is not recommended to do this, but sometimes it is unavoidable while implementing a third-party trait. https://github.com/apache/arrow-datafusion/issues/3777 However, I believe this isn't a problem particular to tokio, or any specific runtime.
- Using Rust to write a Data Pipeline. Thoughts. Musings.
-
Demystifying Apache Arrow
You can see some of the benchmarks in DataFusion (part of the Arrow project and built with Arrow as the underlying in-memory format) https://github.com/apache/arrow-datafusion/blob/master/bench...
Disclaimer: I'm a committer on the Arrow project and contributor to DataFusion.
-
Scala or Rust? which one will rule in future?
polars and datafusion seem very promising
-
Welcome to Comprehensive Rust
Rust has amazing integration with Python through PyO3 [1] so see it like a safe alternative for high performance calculations. The ecosystem itself is starting to come together exciting projects like Polars [2] (Pandas alternative), nalgebra [3], Datafusion [4] and Ballista [5]
[1] https://github.com/PyO3/pyo3
[2] https://github.com/pola-rs/polars/
[3] https://docs.rs/nalgebra/latest/nalgebra/
-
Command-line data analytics made easy
It could be the NDJSON parser (DF source: [0]) or could be a variety of other factors. Looking at the ROAPI release archive [1], it doesn't ship with the definitive `columnq` binary from your comment, so it could also have something to do with compilation-time flags.
FWIW, we use the Parquet format with DataFusion and get very good speeds similar to DuckDB [2], e.g. 1.5s to run a more complex aggregation query `SELECT date_trunc('month', tpep_pickup_datetime) AS month, COUNT(*) AS total_trips, SUM(total_amount) FROM tripdata GROUP BY 1 ORDER BY 1 ASC)` on a 55M row subset of NY Taxi trip data.
[0]: https://github.com/apache/arrow-datafusion/blob/master/dataf...
[1]: https://github.com/roapi/roapi/releases/tag/roapi-v0.8.0
SPyQL is really cool and its design is very smart, with it being able to leverage normal Python functions!
As far as similar tools go, I recommend taking a look at DataFusion[0], dsq[1], and OctoSQL[2].
DataFusion is a very (very very) fast command-line SQL engine but with limited support for data formats.
dsq is based on SQLite which means it has to load data into SQLite first, but then gives you the whole breath of SQLite, it also supports many data formats, but is slower at the same time.
OctoSQL is faster, extensible through plugins, and supports incremental query execution, so you can i.e. calculate a running group by + count while tailing a log file. It also supports normal databases, not just file formats, so you can i.e. join with a Postgres table.
[0]: https://github.com/apache/arrow-datafusion
[1]: https://github.com/multiprocessio/dsq
[2]: https://github.com/cube2222/octosql
Disclaimer: Author of OctoSQL
-
Welcome to InfluxDB IOx: InfluxData’s New Storage Engine
Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
-
GlueSQL: A SQL database engine written as a library in Rust
Another "database toolkit" project that I've recently learned about is Apache DataFusion, also written in rust and uses Arrow memory format:
https://github.com/apache/arrow-datafusion/blob/master/READM...
-
Rust is showing a lot of promise in the DataFrame / tabular data space
[arrow-datafusion](https://github.com/apache/arrow-datafusion) is another great DataFrame library, especially if you like running SQL queries. It's so easy to query a Parquet / CSV dataset with SQL using DataFusion. I've run local benchmarks and it's super fast. The DataFusion docs are a bit lacking, which is a shame, for such a developed and amazing library. I hope to make these better and help spread the world about how truly amazing this lib is.
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f160f6162b0>
www.saashub.com | 24 Mar 2023
Stats
apache/arrow-datafusion is an open source project licensed under Apache License 2.0 which is an OSI approved license.
Popular Comparisons
- arrow-datafusion VS polars
- arrow-datafusion VS ClickHouse
- arrow-datafusion VS db-benchmark
- arrow-datafusion VS databend
- arrow-datafusion VS tikv
- arrow-datafusion VS nushell
- arrow-datafusion VS sea-query
- arrow-datafusion VS datafuse
- arrow-datafusion VS Apache Arrow
- arrow-datafusion VS awesome-rewrite-it-in-rust