Top 23 Java Machine Learning Projects
-
It took me some time to get a good grasp of the power of SQL; and it really kicked in when I learned about optimization rules. It's a program that you rewrite, just like an optimizing compiler would.
You state what you want; you have different ways to fetch and match and massage data; and you can search through this space to produce a physical plan. Hopefully you used knowledge to weight parts to be optimized (table statistics, like Java's JIT would detect hot spots).
I find it fascinating to peer through database code to see what is going on. Lately, there's been new advances towards streaming databases, which bring a whole new design space. For example, now you have latency of individual new rows to optimize for, as opposed to batch it whole to optimize the latency of a dataset. Batch scanning will be benefit from better use of your CPU caches.
And maybe you could have a hybrid system which reads history from a log and aggregates in a batched manner, and then switches to another execution plan when it reaches the end of the log.
If you want to have a peek at that here are Flink's set of rules [1], generic and stream-specific ones. The names can be cryptic, but usually give a good sense of what is going on. For example: PushFilterIntoTableSourceScanRule makes the WHERE clause apply the earliest possible, to save some CPU/network bandwidth further down. PushPartitionIntoTableSourceScanRule tries to make a fan-out/shuffle happen the earliest possible, so that parallelism can be made use of.
[1] https://github.com/apache/flink/blob/5f8fb304fb5d68cdb0b3e3c...
-
Hadoop (a Big Data tool).
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
DL4J
-
Project mention: What libraries do you use for machine learning and data visualizing in scala? | reddit.com/r/scala | 2021-11-27
I use smile https://github.com/haifengl/smile with ammonite and it feels pretty easy/good to work with. Of course for pure looking at data, and exploration, you're not going to beat python.
-
Project mention: MeiliSearch: A Minimalist Full-Text Search Engine | news.ycombinator.com | 2021-08-15
After looking at various alternatives, I'm thinking of trying out https://vespa.ai/ [0]
-
-
Project mention: 2021-09 - Plans & Hopes for Clojure Data Science | reddit.com/r/Clojure | 2021-09-03
Here is link number 1 - Previous text "DJL"
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
-
-
Project mention: Project to rebuild papers with plaintext markup languages | reddit.com/r/Open_Science | 2021-09-25
- I ended up using Grobid, which converts the PDF to a very detailed XML format. The format is not a word processing format though, but a format specifically for representing scientific documents. I don't know, if it would, for example, contain tags about bold or italicized text. The tool is working really well, but since you probably cannot use the output XML format directly, it will need some postprocessing, which would be relatively simple with XML parsing libraries.
-
-
DatumBox
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
-
Project mention: txtai 3.4 released - Build AI-powered semantic search applications in Java | reddit.com/r/java | 2021-10-09
Tribuo (tribuo.org, github.com/oracle/tribuo). ONNX export support is there for 2 models at the moment in main, there's a PR for factorization machines which supports ONNX export, and we plan to add another couple of models and maybe ensembles before the upcoming release. Plus I need to write a tutorial on how it all works, but you can check the tests in the meantime.
-
lychee.js
:seedling: Next-Gen AI-Assisted Isomorphic Application Engine for Embedded, Console, Mobile, Server and Desktop
Note that fair use as a concept (or prior art for that matter) only exist inside the US, not globally.
For example, I'm a European citizen and therefore the EU copyright directive of 2003 applies to me. Inside the European trade union, no legal entity and only human entities can own copyright. Legal entities such as companies can only own perpetual licenses, and contracts that give them the sole copyright usage and distribution rights have been nullified both in front of state level supreme courts and EU level courts a lot (Karlsruhe, Strasbourg, etc).
This also means that technically, if there's no warranty disclosure issued for automated code generation, the authors of the automated program are still responsible for any copyright infringement, legal damages, etc. which is a nightmare if it turns out the code was A/GPL'ed.
I'm just saying this, because there's a world of intellectual property guidelines outside the US, too.
Source: was sued for my lychee.js [1] project a couple times in the past, which was successfully generating composite pattern based codes that were trained based on ES/HyperNEAT hypercubes - also in the robotics/SCADA level factory sector.
-
-
Project mention: Transcribe Speech to Text with Python for Free | reddit.com/r/programming | 2022-03-30
Cool! Leopard operates on files but Cheetah can do live (streaming)
-
Project mention: Project to rebuild papers with plaintext markup languages | reddit.com/r/Open_Science | 2021-09-25
- Another alternative that's on my list but that I didn't try is Cermine.
-
-
Checkout https://github.com/Picovoice/picovoice I saw it on an article from before and it seemed easy to get started on.
-
hms-ml-demo
HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
GitHub - HMS-Core/hms-ml-demo
-
ksql-udf-deep-learning-mqtt-iot
Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data
-
Project mention: Critical New 0-day Vulnerability in Popular Log4j Library - List of applications | dev.to | 2021-12-13
EVLLABS JGAAP : https://github.com/evllabs/JGAAP/releases/tag/v8.0.2
-
LookAtMe
VideoView that plays video only when :eyes: are open and :boy: is detected with various other features (by Pradyuman7)
-
rumble
⛈️ RumbleDB 1.18.0 "Scarlet Ixora" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)
Project mention: RumbleDB: Query with ease a lot of different nested, heterogeneous data formats | news.ycombinator.com | 2021-12-01
Java Machine Learning related posts
- Python vs. Java: Comparing the Pros, Cons, and Use Cases
- Computation reuse via fusion in Amazon Athena
- Pokemon vs Programming
- Text Classification with HMS ML Kit Custom Model
- Unknown Python.exe process taking 2% CPU
- How to show recent GitHub activities on your profile readme
- Wardley mapping the Modern Data Stack
Index
What are some of the best open-source Machine Learning projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | Apache Flink | 18,976 |
2 | Apache Hadoop | 12,591 |
3 | Deeplearning4j | 12,470 |
4 | Smile | 5,508 |
5 | vespa | 3,937 |
6 | Tablesaw | 2,915 |
7 | Deep Java Library (DJL) | 2,523 |
8 | Apache Mahout | 1,992 |
9 | grobid | 1,705 |
10 | Siddhi | 1,335 |
11 | DatumBox | 1,077 |
12 | Tribuo | 1,048 |
13 | lychee.js | 771 |
14 | JSAT | 735 |
15 | cheetah | 443 |
16 | CERMINE | 402 |
17 | oj! Algorithms | 382 |
18 | picovoice | 261 |
19 | hms-ml-demo | 259 |
20 | ksql-udf-deep-learning-mqtt-iot | 255 |
21 | JGAAP | 230 |
22 | LookAtMe | 180 |
23 | rumble | 169 |
Are you hiring? Post a new remote job listing for free.