incubator-gluten
Jupyter Scala
incubator-gluten | Jupyter Scala | |
---|---|---|
3 | 6 | |
988 | 1,565 | |
3.0% | 0.2% | |
9.9 | 9.0 | |
7 days ago | 2 days ago | |
Scala | Scala | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
incubator-gluten
-
A glimpse into the future of data processing infrastructure.
When I first learned about the Gluten project from Intel, I thought Databricks was going to be in trouble.
- FLaNK Stack for 04 December 2023
-
Blaze: Fast query execution engine for Apache Spark
Interesting, looks like it is just DataFusion engine for Spark. There is a similar project: https://github.com/oap-project/gluten - it brings ClickHouse as an engine to Spark.
Jupyter Scala
-
💐 Making VSCode itself a Java REPL 🔁
Checkout almond
- A Python-compatible statically typed language erg-lang/erg
-
EDA libraries for Scala and Spark?
What about https://github.com/alexarchambault/plotly-scala and https://almond.sh/
-
Is there any editor or IDE that supports Ammonite with inline dependencies?
I use Almond in JupyterLab, which has pretty solid code completion. In IntelliJ, you can create a scratch sc file and run lines of it in the Scala REPL. That's really convenient for code completion and I normally will use that when I'm testing something from a specific project.
-
Recommended option for "Java with different syntax"?
The UI part. There's only the scala REPL. I think the closest is a scala kernel for Jupyter notebooks, check this out: https://almond.sh/
-
An SQL Solution for Jupyter
We have used https://almond.sh/ to create a Spark SQL interpreter using Jupyter Notebooks - plus a whole lot more which you can see here: https://arc.tripl.ai/tutorial
After seeing many companies writing ETL using code we decided it was too hard to manage at scale so provided this abstraction layer - which is heavily centered around expressing business logic in SQL - to standardise development (JupyterLab) and allow rapid deployments.
What are some alternatives?
LearningSparkV2 - This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
sparkmagic - Jupyter magics and kernels for working with remote Spark clusters
opaque-sql - An encrypted data analytics platform
Metals - Scala language server with rich IDE features 🚀
blaze - Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Vegas - The missing MatPlotLib for Scala + Spark
blaze - NumPy and Pandas interface to Big Data
Apache Flink - Apache Flink
kyuubi - Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
narrator - David Attenborough narrates your life
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.