library-of-alexandria
Deep Java Library (DJL)
library-of-alexandria | Deep Java Library (DJL) | |
---|---|---|
23 | 13 | |
108 | 3,841 | |
0.9% | 1.3% | |
7.6 | 9.5 | |
24 days ago | 7 days ago | |
Java | Java | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
library-of-alexandria
-
How I archived 100 million PDF documents... (Part 1)
After a quick Google search, I figured out that only less than 1% of ancient texts survived to the modern day. This unfortunate fact was my inspiration to start working on an ambitious web crawling and archival project, called the Library of Alexandria.
-
A newspaper vanished from the internet. Did someone pay to kill it? | *digs into link rot and the loss of digital archives*
Here is a link to the latest releases: https://github.com/bottomless-archive-project/library-of-alexandria/releases
- What do you do when your PC ran out internal HDD cables?
-
Putting 5,998,794 books on IPFS
What do you mean by storage system? Just curious because I'm working on a similar project.
-
r/DataHoarder community is mentioned in this: The Enduring Allure of the Library of Alexandria | On the Media | WNYC Studios
If anybody is interested about the project mentioned in the interview, it's available here: https://github.com/bottomless-archive-project/library-of-alexandria
-
Anyone here with 50TB,100TB+ of personal storage that isn't mostly movies/TV/porn ??
I'm collecting documents. Working on an app suite called Library of Alexandria. Got 91 million docs atm (mostly PDFs) and it's only going up. All of that fits on around 100 TB with gzip compression.
- Archive for software / comp sci books / ebooks?
- Bakancslista
-
Good document classification library in Java
I'm working on an OSS called Library of Alexandria. It is an application that is built to collect, archive, and make searchable various (mostly PDF) documents. I have a little bit more than 90 million documents archived. My next step is to somehow label/classify them.
-
I was wondering what y'all hoarded on your epic setups. I use only one NAS containing 2.8 TB of my personal data. Looking forward to seeing what you hoard.
90 TB of PDFs. I'm working on the Library of Alexandria project. Just a fun little library, nothing more. 😅😅😅
Deep Java Library (DJL)
-
Is deeplearning4j a good choice?
It seems to have been picked up by Eclipse and there is also Oracle Labs' Tribuo and Deep Java Library. All seem active, but I don't know much about any of them. I agree it's probably best to follow the community and use a more popular tool like PyTorch.
-
Just want to vent a bit
Although it may be a bit more work, you can do both machine learning and AI in Java. If you are doing deep learning, you can use DeepJavaLibrary (I do work on this one at Amazon). If you are looking for other ML algorithms, I have seen Smile, Tribuo, or some around Spark.
-
Best way to combine Python and Java?
Image preprocessing I know less about, but tokenization is something I've dealt with a bunch. There are a few options, either push the tokenizer into the ONNX model and use MS's ONNX Runtime extensions (we've used this when working with sentencepiece tokenizers), port the tokenizer entirely to Java (we did this for BERT), or use a sentencepiece or HF tokenizers wrapper directly (e.g. Amazon's DJL did this - HF, sentencepiece).
-
Anybody here using Java for machine learning?
https://djl.ai/ seems very promising. I've played around with it quite a bit, not in real production though. It's a very well documented (https://d2l.djl.ai/) and active project, with Amazon working on it.
- Good document classification library in Java
-
2021-09 - Plans & Hopes for Clojure Data Science
Here is link number 1 - Previous text "DJL"
-
[D] Java vs Python for Machine learning
To give a contrasting perspective, I think the Java ecosystem is much better suited for many data science tasks, and has a growing and well-maintained set of libraries for general purpose machine learning. I won't list them all, but TF-Java, DJL et al. have implementations of many modern architectures and there are a number of excellent libraries (CoreNLP, Lucene et al.) for working with text.
- Does Java has similar project like this one in C#? (ml, data)
-
If it gets better w age, will java become compatible for machine learning and data science?
I think DJL also use use it for their tutorials - https://docs.djl.ai/jupyter/tutorial/01_create_your_first_network.html.
-
Machine learning on JVM
AWS Deep Learning more deep learning.
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Archive.org-Downloader - Python3 script to download archive.org books in PDF format
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
mixnode-warcreader-java - Read Web ARChive (WARC) files in Java.
mediapipe - Cross-platform, customizable ML solutions for live and streaming media.
Paperless - Scan, index, and archive all of your paper documents
Tribuo - Tribuo - A Java machine learning library
precomp-cpp - Precomp, C++ version - further compress already compressed files
CoreNLP - CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
java-warc - Read Web ARChive (WARC) files in Java.
Apache Flink - Apache Flink