Deep Java Library (DJL) vs Apache Spark

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Deep Java Library (DJL)		Apache Spark
	Project
13	Mentions	101
3,841	Stars	38,320
1.9%	Growth	1.1%
9.5	Activity	10.0
about 16 hours ago	Latest Commit	6 days ago
Java	Language	Scala
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Deep Java Library (DJL)

Posts with mentions or reviews of Deep Java Library (DJL). We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-11.

Is deeplearning4j a good choice?
2 projects | /r/java | 11 Mar 2023

It seems to have been picked up by Eclipse and there is also Oracle Labs' Tribuo and Deep Java Library. All seem active, but I don't know much about any of them. I agree it's probably best to follow the community and use a more popular tool like PyTorch.
Just want to vent a bit
3 projects | /r/ProgrammingLanguages | 3 Dec 2022

Although it may be a bit more work, you can do both machine learning and AI in Java. If you are doing deep learning, you can use DeepJavaLibrary (I do work on this one at Amazon). If you are looking for other ML algorithms, I have seen Smile, Tribuo, or some around Spark.
Best way to combine Python and Java?
10 projects | /r/java | 29 Oct 2022

Image preprocessing I know less about, but tokenization is something I've dealt with a bunch. There are a few options, either push the tokenizer into the ONNX model and use MS's ONNX Runtime extensions (we've used this when working with sentencepiece tokenizers), port the tokenizer entirely to Java (we did this for BERT), or use a sentencepiece or HF tokenizers wrapper directly (e.g. Amazon's DJL did this - HF, sentencepiece).
Anybody here using Java for machine learning?
11 projects | /r/java | 13 Sep 2022

https://djl.ai/ seems very promising. I've played around with it quite a bit, not in real production though. It's a very well documented (https://d2l.djl.ai/) and active project, with Amazon working on it.
Good document classification library in Java
2 projects | /r/java | 12 Sep 2022
2021-09 - Plans & Hopes for Clojure Data Science
3 projects | /r/Clojure | 3 Sep 2021

Here is link number 1 - Previous text "DJL"
[D] Java vs Python for Machine learning
4 projects | /r/MachineLearning | 25 Jul 2021

To give a contrasting perspective, I think the Java ecosystem is much better suited for many data science tasks, and has a growing and well-maintained set of libraries for general purpose machine learning. I won't list them all, but TF-Java, DJL et al. have implementations of many modern architectures and there are a number of excellent libraries (CoreNLP, Lucene et al.) for working with text.
Does Java has similar project like this one in C#? (ml, data)
6 projects | /r/java | 23 May 2021
If it gets better w age, will java become compatible for machine learning and data science?
7 projects | /r/java | 20 May 2021

I think DJL also use use it for their tutorials - https://docs.djl.ai/jupyter/tutorial/01_create_your_first_network.html.
Machine learning on JVM
6 projects | /r/scala | 5 Apr 2021

AWS Deep Learning more deep learning.

Apache Spark

Posts with mentions or reviews of Apache Spark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-11.

"xAI will open source Grok"
3 projects | news.ycombinator.com | 11 Mar 2024
Groovy 🎷 Cheat Sheet - 01 Say "Hello" from Groovy
7 projects | dev.to | 7 Mar 2024

Recently I had to revisit the "JVM languages universe" again. Yes, language(s), plural! Java isn't the only language that uses the JVM. I previously used Scala, which is a JVM language, to use Apache Spark for Data Engineering workloads, but this is for another post 😉.
🦿🛴Smarcity garbage reporting automation w/ ollama
6 projects | dev.to | 31 Jan 2024

Consume data into third party software (then let Open Search or Apache Spark or Apache Pinot) for analysis/datascience, GIS systems (so you can put reports on a map) or any ticket management system
Go concurrency simplified. Part 4: Post office as a data pipeline
5 projects | dev.to | 21 Dec 2023

also, this knowledge applies to learning more about data engineering, as this field of software engineering relies heavily on the event-driven approach via tools like Spark, Flink, Kafka, etc.
Five Apache projects you probably didn't know about
8 projects | dev.to | 21 Dec 2023

Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.
Apache Spark VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
Integrate Pyspark Structured Streaming with confluent-kafka
2 projects | dev.to | 12 Aug 2023

Apache Spark - https://spark.apache.org/
Spark – A micro framework for creating web applications in Kotlin and Java
1 project | news.ycombinator.com | 16 Jun 2023

A JVM based framework named "Spark", when https://spark.apache.org exists?
Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
4 projects | news.ycombinator.com | 4 May 2023
PySpark SparkSession Builder with Kubernetes Master
1 project | /r/codehunter | 20 Apr 2023

I recently saw a pull request that was merged to the Apache/Spark repository that apparently adds initial Python bindings for PySpark on K8s. I posted a comment to the PR asking a question about how to use spark-on-k8s in a Python Jupyter notebook, and was told to ask my question here.

What are some alternatives?

When comparing Deep Java Library (DJL) and Apache Spark you can also consider the following projects:

Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

mediapipe - Cross-platform, customizable ML solutions for live and streaming media.

Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Tribuo - Tribuo - A Java machine learning library

Scalding - A Scala API for Cascading

CoreNLP - CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.

mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services

Apache Flink - Apache Flink

luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Deep Java Library (DJL) vs Deeplearning4j Apache Spark vs Trino Deep Java Library (DJL) vs Pytorch Apache Spark vs Pytorch Deep Java Library (DJL) vs mediapipe Apache Spark vs Airflow Deep Java Library (DJL) vs Tribuo Apache Spark vs Scalding Deep Java Library (DJL) vs CoreNLP Apache Spark vs mrjob Deep Java Library (DJL) vs Apache Flink Apache Spark vs luigi

Compare Deep Java Library (DJL) vs Apache Spark and see what are their differences.

Deep Java Library (DJL)

Apache Spark

Deep Java Library (DJL)

Apache Spark

What are some alternatives?