Deep Java Library (DJL)
Apache Flink
Our great sponsors
Deep Java Library (DJL) | Apache Flink | |
---|---|---|
7 | 3 | |
2,617 | 19,221 | |
3.1% | 1.3% | |
9.4 | 10.0 | |
3 days ago | 3 days ago | |
Java | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Deep Java Library (DJL)
-
2021-09 - Plans & Hopes for Clojure Data Science
Here is link number 1 - Previous text "DJL"
Regarding Tensorflow: As far as I understand, it is accessible through DJL, which has a Clojure wrapper (work in progress): clj-djl. (But I haven't tried it.)
-
[D] Java vs Python for Machine learning
To give a contrasting perspective, I think the Java ecosystem is much better suited for many data science tasks, and has a growing and well-maintained set of libraries for general purpose machine learning. I won't list them all, but TF-Java, DJL et al. have implementations of many modern architectures and there are a number of excellent libraries (CoreNLP, Lucene et al.) for working with text.
- Does Java has similar project like this one in C#? (ml, data)
-
If it gets better w age, will java become compatible for machine learning and data science?
I think DJL also use use it for their tutorials - https://docs.djl.ai/jupyter/tutorial/01_create_your_first_network.html.
-
Machine learning on JVM
AWS Deep Learning more deep learning.
-
Weekly Developer Roundup #21 - Sun Nov 08 2020
awslabs/djl (Java): An Engine-Agnostic Deep Learning Framework in Java
Apache Flink
-
DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It
Apache Drill, Druid, Flink, Hive, Kafka, Spark
-
Computation reuse via fusion in Amazon Athena
It took me some time to get a good grasp of the power of SQL; and it really kicked in when I learned about optimization rules. It's a program that you rewrite, just like an optimizing compiler would.
You state what you want; you have different ways to fetch and match and massage data; and you can search through this space to produce a physical plan. Hopefully you used knowledge to weight parts to be optimized (table statistics, like Java's JIT would detect hot spots).
I find it fascinating to peer through database code to see what is going on. Lately, there's been new advances towards streaming databases, which bring a whole new design space. For example, now you have latency of individual new rows to optimize for, as opposed to batch it whole to optimize the latency of a dataset. Batch scanning will be benefit from better use of your CPU caches.
And maybe you could have a hybrid system which reads history from a log and aggregates in a batched manner, and then switches to another execution plan when it reaches the end of the log.
If you want to have a peek at that here are Flink's set of rules [1], generic and stream-specific ones. The names can be cryptic, but usually give a good sense of what is going on. For example: PushFilterIntoTableSourceScanRule makes the WHERE clause apply the earliest possible, to save some CPU/network bandwidth further down. PushPartitionIntoTableSourceScanRule tries to make a fan-out/shuffle happen the earliest possible, so that parallelism can be made use of.
[1] https://github.com/apache/flink/blob/5f8fb304fb5d68cdb0b3e3c...
-
Avro SpecificRecord File Sink using apache flink is not compiling due to error incompatible types: FileSink<?> cannot be converted to SinkFunction<?>
[1]: https://mvnrepository.com/artifact/org.apache.avro/avro-maven-plugin/1.8.2 [2]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java [3]: https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/file_sink/ [4]: https://github.com/apache/flink/blob/c81b831d5fe08d328251d91f4f255b1508a9feb4/flink-end-to-end-tests/flink-file-sink-test/src/main/java/FileSinkProgram.java [5]: https://github.com/rajcspsg/streaming-file-sink-demo
What are some alternatives?
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
mediapipe - Cross-platform, customizable ML solutions for live and streaming media.
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
H2O - Sparkling Water provides H2O functionality inside Spark cluster
Apache Kafka - Mirror of Apache Kafka
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Gearpump - Lightweight real-time big data streaming engine over Akka
Tribuo - Tribuo - A Java machine learning library
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
Jupyter Scala - A Scala kernel for Jupyter
Weka