OCRmyPDF
pdfarranger
OCRmyPDF | pdfarranger | |
---|---|---|
1 | 93 | |
18 | 3,037 | |
- | 3.8% | |
3.6 | 8.9 | |
almost 2 years ago | 21 days ago | |
Python | Python | |
Mozilla Public License 2.0 | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OCRmyPDF
-
OCRmyPDF: Add an OCR text layer to scanned PDF file
As mentioned in the other replies, Google's OCR is limited. OCRmyPDF is designed for PDFs. So if you download a 1000+ page public-domain dictionary off of Archive.org (which is something I do regularly), and you want to re-run the OCR because Internet Archive doesn't tune its OCR very well for multilingual works (if it all), then OCRmyPDF is going to beat Google's automatic OCR every time.
However, I recently paid a programmer to fork OCRmyPDF to give it the option to use Google's OCR engine instead of Tesseract. That fork is here: https://github.com/ualiawan/OCRmyPDF. It's more fiddly than the regular OCRmyPDF, and it requires a Google Cloud Vision account (which charges some fraction of a cent for each page OCRed), but it works well, and in some cases may produce better results than OCRmyPDF, although you must be sure to specify the language of the document.
pdfarranger
-
Pdftool.org: modify pdfs offline in the browser
On Linux I like to use:
https://github.com/pdfarranger/pdfarranger and https://gitlab.com/scarpetta/pdfmixtool for such tasks.
-
Wechsel von Windows auf Linux - zu viele Programme Windows-only?
Für PDFs verwende ich pdfarranger.
- Transition from Windows to Linux: is there a way to do this things on linux too?
-
A note of appreciation for paperless ngx
I see some questions in the comments about document splitting and if you are using PDF for the export in scanning, this adds an extra step but may be valuable in the long run. For Linux users at least, there is "pdfarranger" which most distros have. You can install that, load the PDF and re-arrange pages, remove pages, etc.
-
An open-source pdf editor?
PDFArranger
-
What's a really niche tool you use that you can't live without?
So, I guess, pdfarranger might be faster than PDFSam. https://github.com/pdfarranger/pdfarranger
- Converting a PowerPoint that's been exported the wrong way?
-
Software to convert many JPG's into a PDF or similar.
I like PDF Arranger for converting JPEGs to PDFs (open source on Windows) https://github.com/pdfarranger/pdfarranger.
-
PDF Arranger KDE Alternative?
I have always only been using PDF Arranger for manipulating PDF files (merging different files, arranging, adding, deleting pages) and I'm wondering if there really is no KDE equivalent for that application.
- Can anyone recommend a free PDF splitter?
What are some alternatives?
naps2 - Scan documents to PDF and more, as simply as possible.
pdfsam - PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
doctr - docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
AppImageKit - Package desktop applications as AppImages that run on common Linux-based operating systems, such as RHEL, CentOS, openSUSE, SLED, Ubuntu, Fedora, debian and derivatives. Join #AppImage on irc.libera.chat
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
pdfslicer - A simple application to extract, merge, rotate and reorder pages of PDF documents
scantailor-universal - ScanTailor Universal - a fork based on Enhanced+Featured+Master versions of ST
ExpansionCards - Reference designs and documentation to create Expansion Cards for the Framework Laptop
nautilus-pdf-tools - Tools to work with PDF files from Nautilus
web-pdf-toolbox - Simple web toolbox for PDF files
gotenberg - A developer-friendly API for converting numerous document formats into PDF files, and more!
release-review - Monthly Review-ISOs for Manjaro Linux