|Deep Java Library (DJL)||Apache Flink|
|3 days ago||3 days ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Deep Java Library (DJL)
2021-09 - Plans & Hopes for Clojure Data Science
3 projects | reddit.com/r/Clojure | 3 Sep 2021
Here is link number 1 - Previous text "DJL"3 projects | reddit.com/r/Clojure | 3 Sep 2021
Regarding Tensorflow: As far as I understand, it is accessible through DJL, which has a Clojure wrapper (work in progress): clj-djl. (But I haven't tried it.)
[D] Java vs Python for Machine learning
4 projects | reddit.com/r/MachineLearning | 25 Jul 2021
To give a contrasting perspective, I think the Java ecosystem is much better suited for many data science tasks, and has a growing and well-maintained set of libraries for general purpose machine learning. I won't list them all, but TF-Java, DJL et al. have implementations of many modern architectures and there are a number of excellent libraries (CoreNLP, Lucene et al.) for working with text.
Does Java has similar project like this one in C#? (ml, data)
6 projects | reddit.com/r/java | 23 May 2021
If it gets better w age, will java become compatible for machine learning and data science?
7 projects | reddit.com/r/java | 20 May 2021
I think DJL also use use it for their tutorials - https://docs.djl.ai/jupyter/tutorial/01_create_your_first_network.html.
Machine learning on JVM
6 projects | reddit.com/r/scala | 5 Apr 2021
AWS Deep Learning more deep learning.
Weekly Developer Roundup #21 - Sun Nov 08 2020
28 projects | dev.to | 7 Nov 2020
awslabs/djl (Java): An Engine-Agnostic Deep Learning Framework in Java
DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It
21 projects | dev.to | 2 Jun 2022
Apache Drill, Druid, Flink, Hive, Kafka, Spark
Computation reuse via fusion in Amazon Athena
2 projects | news.ycombinator.com | 20 May 2022
It took me some time to get a good grasp of the power of SQL; and it really kicked in when I learned about optimization rules. It's a program that you rewrite, just like an optimizing compiler would.
You state what you want; you have different ways to fetch and match and massage data; and you can search through this space to produce a physical plan. Hopefully you used knowledge to weight parts to be optimized (table statistics, like Java's JIT would detect hot spots).
I find it fascinating to peer through database code to see what is going on. Lately, there's been new advances towards streaming databases, which bring a whole new design space. For example, now you have latency of individual new rows to optimize for, as opposed to batch it whole to optimize the latency of a dataset. Batch scanning will be benefit from better use of your CPU caches.
And maybe you could have a hybrid system which reads history from a log and aggregates in a batched manner, and then switches to another execution plan when it reaches the end of the log.
If you want to have a peek at that here are Flink's set of rules , generic and stream-specific ones. The names can be cryptic, but usually give a good sense of what is going on. For example: PushFilterIntoTableSourceScanRule makes the WHERE clause apply the earliest possible, to save some CPU/network bandwidth further down. PushPartitionIntoTableSourceScanRule tries to make a fan-out/shuffle happen the earliest possible, so that parallelism can be made use of.
Avro SpecificRecord File Sink using apache flink is not compiling due to error incompatible types: FileSink<?> cannot be converted to SinkFunction<?>
3 projects | reddit.com/r/apacheflink | 14 Sep 2021
: https://mvnrepository.com/artifact/org.apache.avro/avro-maven-plugin/1.8.2 : https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java : https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/file_sink/ : https://github.com/apache/flink/blob/c81b831d5fe08d328251d91f4f255b1508a9feb4/flink-end-to-end-tests/flink-file-sink-test/src/main/java/FileSinkProgram.java : https://github.com/rajcspsg/streaming-file-sink-demo
What are some alternatives?
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
mediapipe - Cross-platform, customizable ML solutions for live and streaming media.
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
H2O - Sparkling Water provides H2O functionality inside Spark cluster
Apache Kafka - Mirror of Apache Kafka
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Gearpump - Lightweight real-time big data streaming engine over Akka
Tribuo - Tribuo - A Java machine learning library
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
Jupyter Scala - A Scala kernel for Jupyter