delta-sharing
Jupyter Scala
delta-sharing | Jupyter Scala | |
---|---|---|
4 | 6 | |
683 | 1,568 | |
1.6% | 0.4% | |
7.8 | 9.0 | |
16 days ago | 2 days ago | |
Scala | Scala | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
delta-sharing
- Azure data lake - Data Share
-
Why are array databases not extremely popular and mature?
If you buy into the Spark point of view that parquet is a good way of storing images, https://delta.io/sharing/ is a F/OSS (linux foundation) solution.
- Delta Sharing: An Open Protocol for Secure Data Sharing
-
Delta Sharing on premise
AFAIK there isn't another implementation other than the reference implementation and the Databricks Cloud implementation. We've found Databricks to be fairly good at responding to issues on Github. So you might want to [submit one and see what they say](https://github.com/delta-io/delta-sharing/issues).
Jupyter Scala
-
💐 Making VSCode itself a Java REPL 🔁
Checkout almond
- A Python-compatible statically typed language erg-lang/erg
-
EDA libraries for Scala and Spark?
What about https://github.com/alexarchambault/plotly-scala and https://almond.sh/
-
Is there any editor or IDE that supports Ammonite with inline dependencies?
I use Almond in JupyterLab, which has pretty solid code completion. In IntelliJ, you can create a scratch sc file and run lines of it in the Scala REPL. That's really convenient for code completion and I normally will use that when I'm testing something from a specific project.
-
Recommended option for "Java with different syntax"?
The UI part. There's only the scala REPL. I think the closest is a scala kernel for Jupyter notebooks, check this out: https://almond.sh/
-
An SQL Solution for Jupyter
We have used https://almond.sh/ to create a Spark SQL interpreter using Jupyter Notebooks - plus a whole lot more which you can see here: https://arc.tripl.ai/tutorial
After seeing many companies writing ETL using code we decided it was too hard to manage at scale so provided this abstraction layer - which is heavily centered around expressing business logic in SQL - to standardise development (JupyterLab) and allow rapid deployments.
What are some alternatives?
LakeSoul - LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
sparkmagic - Jupyter magics and kernels for working with remote Spark clusters
LearningSparkV2 - This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Metals - Scala language server with rich IDE features 🚀
mmlspark - Simple and Distributed Machine Learning [Moved to: https://github.com/microsoft/SynapseML]
Vegas - The missing MatPlotLib for Scala + Spark
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Apache Flink - Apache Flink
SynapseML - Simple and Distributed Machine Learning
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
Hail - Cloud-native genomic dataframes and batch computing