qbeast-spark
Jupyter Scala
qbeast-spark | Jupyter Scala | |
---|---|---|
12 | 6 | |
192 | 1,564 | |
4.7% | 0.1% | |
8.6 | 9.0 | |
4 days ago | 19 days ago | |
Scala | Scala | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
qbeast-spark
- Release 0.3.2 of qbeast-spark!
- Qbeast-Spark Visualizer!
- Release 0.3.1 of Qbeast Spark
-
Collaborative roadmap for qbeast-spark: Open Source Table Format
We want to develop qbeast-spark in an open way, so we publish a tentative Roadmap for this summer https://github.com/Qbeast-io/qbeast-spark/discussions/108
- qbeast-spark v0.2.0 available on Maven Central Repository
-
Datasource enabling multidimensional indexing and sampling pushdown
If you want to play with it, check out the Qbeast-Spark github
- Apache Spark Datasource enabling multidimensional indexing and sampling pushdown
- New DataSource enabling multi-columnar indexing and efficient data sampling
Jupyter Scala
-
💐 Making VSCode itself a Java REPL 🔁
Checkout almond
- A Python-compatible statically typed language erg-lang/erg
-
EDA libraries for Scala and Spark?
What about https://github.com/alexarchambault/plotly-scala and https://almond.sh/
-
Is there any editor or IDE that supports Ammonite with inline dependencies?
I use Almond in JupyterLab, which has pretty solid code completion. In IntelliJ, you can create a scratch sc file and run lines of it in the Scala REPL. That's really convenient for code completion and I normally will use that when I'm testing something from a specific project.
-
Recommended option for "Java with different syntax"?
The UI part. There's only the scala REPL. I think the closest is a scala kernel for Jupyter notebooks, check this out: https://almond.sh/
-
An SQL Solution for Jupyter
We have used https://almond.sh/ to create a Spark SQL interpreter using Jupyter Notebooks - plus a whole lot more which you can see here: https://arc.tripl.ai/tutorial
After seeing many companies writing ETL using code we decided it was too hard to manage at scale so provided this abstraction layer - which is heavily centered around expressing business logic in SQL - to standardise development (JupyterLab) and allow rapid deployments.
What are some alternatives?
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
sparkmagic - Jupyter magics and kernels for working with remote Spark clusters
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Metals - Scala language server with rich IDE features 🚀
Spark Utils - Basic framework utilities to quickly start writing production ready Apache Spark applications
Vegas - The missing MatPlotLib for Scala + Spark
mmlspark - Simple and Distributed Machine Learning [Moved to: https://github.com/microsoft/SynapseML]
Apache Flink - Apache Flink
Clustering4Ever - C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Sparkplug - Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.