incubator-gluten
qbeast-spark
incubator-gluten | qbeast-spark | |
---|---|---|
3 | 12 | |
988 | 192 | |
3.0% | 4.7% | |
9.9 | 8.6 | |
7 days ago | 9 days ago | |
Scala | Scala | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
incubator-gluten
-
A glimpse into the future of data processing infrastructure.
When I first learned about the Gluten project from Intel, I thought Databricks was going to be in trouble.
- FLaNK Stack for 04 December 2023
-
Blaze: Fast query execution engine for Apache Spark
Interesting, looks like it is just DataFusion engine for Spark. There is a similar project: https://github.com/oap-project/gluten - it brings ClickHouse as an engine to Spark.
qbeast-spark
- Release 0.3.2 of qbeast-spark!
- Qbeast-Spark Visualizer!
- Release 0.3.1 of Qbeast Spark
-
Collaborative roadmap for qbeast-spark: Open Source Table Format
We want to develop qbeast-spark in an open way, so we publish a tentative Roadmap for this summer https://github.com/Qbeast-io/qbeast-spark/discussions/108
- qbeast-spark v0.2.0 available on Maven Central Repository
-
Datasource enabling multidimensional indexing and sampling pushdown
If you want to play with it, check out the Qbeast-Spark github
- Apache Spark Datasource enabling multidimensional indexing and sampling pushdown
- New DataSource enabling multi-columnar indexing and efficient data sampling
What are some alternatives?
LearningSparkV2 - This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
opaque-sql - An encrypted data analytics platform
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
blaze - Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Spark Utils - Basic framework utilities to quickly start writing production ready Apache Spark applications
blaze - NumPy and Pandas interface to Big Data
mmlspark - Simple and Distributed Machine Learning [Moved to: https://github.com/microsoft/SynapseML]
Jupyter Scala - A Scala kernel for Jupyter
Clustering4Ever - C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
kyuubi - Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Sparkplug - Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌