Jupyter Scala
Apache Flink
Our great sponsors
Jupyter Scala | Apache Flink | |
---|---|---|
6 | 9 | |
1,562 | 23,039 | |
0.4% | 1.1% | |
9.2 | 9.9 | |
2 days ago | about 2 hours ago | |
Scala | Java | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Jupyter Scala
-
💐 Making VSCode itself a Java REPL 🔁
Checkout almond
- A Python-compatible statically typed language erg-lang/erg
-
EDA libraries for Scala and Spark?
What about https://github.com/alexarchambault/plotly-scala and https://almond.sh/
-
Is there any editor or IDE that supports Ammonite with inline dependencies?
I use Almond in JupyterLab, which has pretty solid code completion. In IntelliJ, you can create a scratch sc file and run lines of it in the Scala REPL. That's really convenient for code completion and I normally will use that when I'm testing something from a specific project.
-
Recommended option for "Java with different syntax"?
The UI part. There's only the scala REPL. I think the closest is a scala kernel for Jupyter notebooks, check this out: https://almond.sh/
-
An SQL Solution for Jupyter
We have used https://almond.sh/ to create a Spark SQL interpreter using Jupyter Notebooks - plus a whole lot more which you can see here: https://arc.tripl.ai/tutorial
After seeing many companies writing ETL using code we decided it was too hard to manage at scale so provided this abstraction layer - which is heavily centered around expressing business logic in SQL - to standardise development (JupyterLab) and allow rapid deployments.
Apache Flink
-
First 15 Open Source Advent projects
7. Apache Flink | Github | tutorial
-
I keep getting build failure when I try to run mvn clean compile package
I'm trying to use https://github.com/mauricioaniche/ck to analyze the ck metrics of https://github.com/apache/flink. I have the latest version of java downloaded and I have the latest version of apache maven downloaded too. My environment variables are set correctly. I'm in the correct directory as well. However, when I run mvn clean compile package in powershell it always says build error. I've tried looking up the errors but there's so many. https://imgur.com/a/Zk8Snsa I'm very new to programming in general so any suggestions would be appreciated.
- We Are Changing the License for Akka
-
DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It
Apache Drill, Druid, Flink, Hive, Kafka, Spark
-
Computation reuse via fusion in Amazon Athena
It took me some time to get a good grasp of the power of SQL; and it really kicked in when I learned about optimization rules. It's a program that you rewrite, just like an optimizing compiler would.
You state what you want; you have different ways to fetch and match and massage data; and you can search through this space to produce a physical plan. Hopefully you used knowledge to weight parts to be optimized (table statistics, like Java's JIT would detect hot spots).
I find it fascinating to peer through database code to see what is going on. Lately, there's been new advances towards streaming databases, which bring a whole new design space. For example, now you have latency of individual new rows to optimize for, as opposed to batch it whole to optimize the latency of a dataset. Batch scanning will be benefit from better use of your CPU caches.
And maybe you could have a hybrid system which reads history from a log and aggregates in a batched manner, and then switches to another execution plan when it reaches the end of the log.
If you want to have a peek at that here are Flink's set of rules [1], generic and stream-specific ones. The names can be cryptic, but usually give a good sense of what is going on. For example: PushFilterIntoTableSourceScanRule makes the WHERE clause apply the earliest possible, to save some CPU/network bandwidth further down. PushPartitionIntoTableSourceScanRule tries to make a fan-out/shuffle happen the earliest possible, so that parallelism can be made use of.
[1] https://github.com/apache/flink/blob/5f8fb304fb5d68cdb0b3e3c...
-
Avro SpecificRecord File Sink using apache flink is not compiling due to error incompatible types: FileSink<?> cannot be converted to SinkFunction<?>
[1]: https://mvnrepository.com/artifact/org.apache.avro/avro-maven-plugin/1.8.2 [2]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java [3]: https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/file_sink/ [4]: https://github.com/apache/flink/blob/c81b831d5fe08d328251d91f4f255b1508a9feb4/flink-end-to-end-tests/flink-file-sink-test/src/main/java/FileSinkProgram.java [5]: https://github.com/rajcspsg/streaming-file-sink-demo
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
H2O - Sparkling Water provides H2O functionality inside Spark cluster
sparkmagic - Jupyter magics and kernels for working with remote Spark clusters
Apache Kafka - Mirror of Apache Kafka
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
Gearpump - Lightweight real-time big data streaming engine over Akka
Deep Java Library (DJL) - An Engine-Agnostic Deep Learning Framework in Java
Smile - Statistical Machine Intelligence & Learning Engine
Metals - Scala language server with rich IDE features 🚀
Weka