library-of-alexandria
Paperless
library-of-alexandria | Paperless | |
---|---|---|
23 | 27 | |
108 | 7,543 | |
0.9% | - | |
7.6 | 5.3 | |
25 days ago | about 3 years ago | |
Java | Python | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
library-of-alexandria
-
How I archived 100 million PDF documents... (Part 1)
After a quick Google search, I figured out that only less than 1% of ancient texts survived to the modern day. This unfortunate fact was my inspiration to start working on an ambitious web crawling and archival project, called the Library of Alexandria.
-
A newspaper vanished from the internet. Did someone pay to kill it? | *digs into link rot and the loss of digital archives*
Here is a link to the latest releases: https://github.com/bottomless-archive-project/library-of-alexandria/releases
- What do you do when your PC ran out internal HDD cables?
-
Putting 5,998,794 books on IPFS
What do you mean by storage system? Just curious because I'm working on a similar project.
-
r/DataHoarder community is mentioned in this: The Enduring Allure of the Library of Alexandria | On the Media | WNYC Studios
If anybody is interested about the project mentioned in the interview, it's available here: https://github.com/bottomless-archive-project/library-of-alexandria
-
Anyone here with 50TB,100TB+ of personal storage that isn't mostly movies/TV/porn ??
I'm collecting documents. Working on an app suite called Library of Alexandria. Got 91 million docs atm (mostly PDFs) and it's only going up. All of that fits on around 100 TB with gzip compression.
- Archive for software / comp sci books / ebooks?
- Bakancslista
-
Good document classification library in Java
I'm working on an OSS called Library of Alexandria. It is an application that is built to collect, archive, and make searchable various (mostly PDF) documents. I have a little bit more than 90 million documents archived. My next step is to somehow label/classify them.
-
I was wondering what y'all hoarded on your epic setups. I use only one NAS containing 2.8 TB of my personal data. Looking forward to seeing what you hoard.
90 TB of PDFs. I'm working on the Library of Alexandria project. Just a fun little library, nothing more. 😅😅😅
Paperless
-
🔍Underrated Open Source Projects You Should Know About 🧠
Paperless-ngx is the successor to the original Paperless & Paperless-ng projects, both of which are now in public archive. The original projects are not dead, but rather, continued through the open source community!
-
Paperless-Ngx v2.0.0
There's this:
https://github.com/the-paperless-project/paperless/issues/20
I don't know if it made it's way into this fork.
-
Welche App zum Einscannen von privaten Unterlagen ist empfehlenswert?
Paperless: https://github.com/the-paperless-project/paperless
- Québec lifehack: la BanQ!
-
My take on document archiving: Virtualpaper
Agreed. It's difficult to beat paperless* for single-user systems. It does have some rudimentary user management from Django, but it's an admin party and there's no way to give users separate repositories. They've been looking into it for years, going back as far as the original paperless project. I imagine such a feature is difficult to add on as an afterthought because it touches everything, so it should be built in from the very beginning. And it seems they're looking for a perfect implementation, which may or may not exist. Thus, years later, paperless* remains effectively a single-user app.
-
Paperless-NGX
If I understand this correctly, the original Paperless was archived (Archival notice)[https://github.com/the-paperless-project/paperless/commit/9b...], so Paperless-NG was created.
Now that Paperless-NG seems to be going unmaintained (last commit on 15th Sep 2021), Paperless-NGX has been created with a focus on an org, so that the continuity of the project can be maintained with a simple path for the original creators to join back if they want to.
I don't think the community could have handled this better!
-
Announcing first release of Paperless-ngx, the community-supported successor to Paperless-ng
As many of you know "Paperless-ng" was a very popular fork of the document management system "Paperless". The initial author of -ng, Jonas Winkler, created an amazing project that was eventually designated as the 'official' successor. He maintained a furious development pace for some time but as of this post hasn't been heard from in months. A group of folks dedicated to the software (myself included) decided to try and revive the project and hopefully set it up for a long future. Yes, a similar thing happened with the original Paperless, we are hoping to avoid some of the same mistakes. See jonaswinkler/paperless-ng#1599, jonaswinkler/paperless-ng#1632 and historically the-paperless-project/paperless#711 if you are curious for more about all of this.
-
Alternative paperless-ng
yes, I know that project. Paperless-ng is actually a fork of paperless project. I borrowed many great ideas from both projects. Unfortunately both projects are now archived (paperless-ng is not officially archived, but in last 6 months there was no development, as it looks to me that main developer lost interest in the project).
- Just want to share my homelab
-
Can someone recommend me a decent cheap document scanner?
I have a Brother ADS-1700W, works fine, a little fiddly to set up the profiles for one touch scanning but once it's done it's fine. I set up a workflow with https://github.com/the-paperless-project/paperless that lets me scan straight into OCR. https://github.com/jonaswinkler/paperless-ng is the fork that I'm going to upgrade to in my CFT.
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
mayan-edms
Archive.org-Downloader - Python3 script to download archive.org books in PDF format
Papermerge - Open Source Document Management System for Digital Archives (Scanned Documents)
mixnode-warcreader-java - Read Web ARChive (WARC) files in Java.
precomp-cpp - Precomp, C++ version - further compress already compressed files
Docspell - Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
java-warc - Read Web ARChive (WARC) files in Java.
Mayan EDMS - Free Open Source Document Management System (mirror, no pull request or issues)
url-collector - An application that crawls the Common Crawl corpus for URLs with the specified file extensions.
CUPS - Apple CUPS Sources