qbeast-spark
Local-Data-LakeHouse
qbeast-spark | Local-Data-LakeHouse | |
---|---|---|
12 | 1 | |
192 | 44 | |
4.7% | - | |
8.6 | 4.4 | |
4 days ago | 8 months ago | |
Scala | Dockerfile | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
qbeast-spark
- Release 0.3.2 of qbeast-spark!
- Qbeast-Spark Visualizer!
- Release 0.3.1 of Qbeast Spark
-
Collaborative roadmap for qbeast-spark: Open Source Table Format
We want to develop qbeast-spark in an open way, so we publish a tentative Roadmap for this summer https://github.com/Qbeast-io/qbeast-spark/discussions/108
- qbeast-spark v0.2.0 available on Maven Central Repository
-
Datasource enabling multidimensional indexing and sampling pushdown
If you want to play with it, check out the Qbeast-Spark github
- Apache Spark Datasource enabling multidimensional indexing and sampling pushdown
- New DataSource enabling multi-columnar indexing and efficient data sampling
Local-Data-LakeHouse
-
Project showcase: sample Data Lakehouse
Here is the Github repo: https://github.com/dominikhei/Local-Data-LakeHouse
What are some alternatives?
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
matano - Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
incubator-xtable - Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Spark Utils - Basic framework utilities to quickly start writing production ready Apache Spark applications
minio-dokku - Dockerfile to run Minio (S3 compatible storage) on Dokku (mini-Heroku)
mmlspark - Simple and Distributed Machine Learning [Moved to: https://github.com/microsoft/SynapseML]
cuelake - Use SQL to build ELT pipelines on a data lakehouse.
Clustering4Ever - C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
hive-metastore - Apache Hive Metastore as a Standalone server in Docker
Sparkplug - Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Rudderstack - Privacy and Security focused Segment-alternative, in Golang and React