incubator-gluten
blaze
incubator-gluten | blaze | |
---|---|---|
3 | 8 | |
988 | 898 | |
3.0% | 5.0% | |
9.9 | 9.3 | |
7 days ago | 4 days ago | |
Scala | Rust | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
incubator-gluten
-
A glimpse into the future of data processing infrastructure.
When I first learned about the Gluten project from Intel, I thought Databricks was going to be in trouble.
- FLaNK Stack for 04 December 2023
-
Blaze: Fast query execution engine for Apache Spark
Interesting, looks like it is just DataFusion engine for Spark. There is a similar project: https://github.com/oap-project/gluten - it brings ClickHouse as an engine to Spark.
blaze
- Blaze: Fast query execution engine for Apache Spark
-
🐼 Pandas 2.0 Up To 32x Faster
There is a project called blaze that aims to convert Spark plans into datafusion plans to run it more efficiently.
-
Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet
Datafusion out performs spark by a large margin. It is on par with photon, see benchmark at https://github.com/blaze-init/blaze.
- Anouncing Blaze: A Rustified OpenCL Experience
- Blaze: A Rust-based vectorized accelerator to speed up your Spark jobs with less resources
- Blaze: A Rust-based vectorized accelerator to speed up your Spark jobs
- Blaze: a Rust-based vectorized accelerator to speed up your Spark jobs with fewer resources.
What are some alternatives?
LearningSparkV2 - This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
dasel - Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
opaque-sql - An encrypted data analytics platform
roapi - Create full-fledged APIs for slowly moving datasets without writing a single line of code.
blaze - NumPy and Pandas interface to Big Data
zsv - zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser
Jupyter Scala - A Scala kernel for Jupyter
octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
kyuubi - Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
lnav - Log file navigator
narrator - David Attenborough narrates your life
xsv - A fast CSV command line toolkit written in Rust.