scanservjs
OCRmyPDF
scanservjs | OCRmyPDF | |
---|---|---|
11 | 77 | |
669 | 12,134 | |
- | 3.2% | |
8.0 | 9.5 | |
14 days ago | 2 days ago | |
JavaScript | Python | |
GNU General Public License v3.0 only | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scanservjs
- Google Photos alternative with OCR
-
Fujitsu iX1600 or Competitor for Paperless-ngx
My solution to use my workforce with paperless is scanservjs and scantopl and I'm pretty happy with it
- paperless and brother AiO-printer: No duplex in the scan profile?
-
What are your top self hosted services that you are very satisfied with ?
I use scanservjs and scantopl on docker.
-
ScanServerJs setup guide
Recently I stumbled upon ScanServerJs, an awesome software that allows you to setup your own scan server.
- Print/scan server replacing crappy HP integration
-
Vuescan – Software support for 6500 abandoned scanners
Nothing to do with VueScan, but I just went through a journey trying to find a good solution for an old HP laserjet/scanner combo that I picked up used for $40. Although it's not wifi enabled, and has no Mac support (that I need for a couple of laptops in my household), I got it running perfectly on a Raspberry Pi, despite some difficulties with the HP driver "ecosystem". (Why do they make it so complicated..)
Once I got it working, I set up `saned` and was able to scan from my Ubuntu laptop over wifi just fine using `gscan2pdf`, which has a horrible user interface but at least gets the job done. Honestly it was a bit surprising to me to discover how terrible the scanning landscape still is in open source software, but I guess it's just not one of those things that is needed that often so some basic projects that were thrown together a decade ago still kind of work and that's that. I can accept that.
However, I absolutely could not manage to get any kind of "bridge" working for the Mac laptop, which apparently uses a different scanner API called TWAIN, and no amount of messing around with TWAIN/sane bridges worked out. Not to mention that one of the two Macs was a "work" laptop for which it was not allowed to install drivers or system software.
The whole time I was thinking, man, all I want is some web interface where I can log into the RPi and drive the scanner, and download the result, why is that so difficult. Lo and behold I came across this amazing project that solved the whole problem for me: https://github.com/sbs20/scanservjs. I'm not affiliated with it, but I was just so impressed at how well it worked that I feel the need to mention it, and give kudos to the author.
It was even easy to get running, just a single docker command on the RPi and it was up and running. The laptops can connect easily, of course, because it's just a local web server, and you can do multipage scans to PDF, which is really all I wanted. Fantastic bit of OSS, highly recommended if anyone finds themselves in a similar situation. Forget remote protocols and installing drivers, etc., just run this on the Pi and you're in business.
-
Office Printer and Scanner needed.
We currently have an HP M1136 MFP connected to CUPS on the local network. It always turns itself off and is never available. People have mostly resorted to walking over with their laptop and connecting via USB. Also there are no good network interfaces to the scanner. We use https://github.com/sbs20/scanservjs to control the scanner but it's too buggy and slow.
- Document scanner server for usb printer?
-
Document management, OCR processes, and my love for ScanServer-js.
I've had an old all-in-one HP USB printer/scanner hooked up to a Raspberry Pi for a few years running CUPS. Network printing has been great via this method. But the scanner portion has sat unused ever since. Until, now.... WHY DID NOBODY TELL ME ABOUT SCANSERVER-JS?! My word this is incredible! It does for scanning what CUPS does for printing, and with a beautiful Web UI.
OCRmyPDF
-
TextSnatcher: Copy text from images, for the Linux Desktop
Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.
- FLaNK Stack Weekly 19 Feb 2024
-
Calibre – New in Calibre 7.0
I recommend running any such PDFs through OCRmyPDF.
https://github.com/ocrmypdf/OCRmyPDF
-
A better document viewer
If by "like a photocopy" you mean the file contains images of text rather than text, the MacOS viewer presumably does OCR on the images. I don't know if there's a Linux document viewer with that capability built-in, but a quick search turned up the standalone tool OCRmyPDF.
- Gibts ein (CLI) tool, das Kontrast und Helligkeit von gescannten Textdokumenten dynamisch anpasst?
-
OCR for a full pdf on Neoreader
For anyone interested I solved the problem by first ocr files through the free and open source software ocrmypdf avaible here
-
ELI5: why is PDF such a widespread text format, instead of a format that's actually easier to edit?
ocrmypdf is nice for stuff like that.
- Donut: OCR-Free Document Understanding Transformer
-
massive crop and OCR newspaper
Use imagemagick to convert them to PDF and ocrmypdf to straighten and OCR. See this explanation.
-
OCR pdf and just keep the OCR text
Fair enough, maybe this might work for you, it should seperate the text from image anyway and if you have Adobe acrobat it should be able delete the background too with the edit function. It may already be able to do that if you haven't tried it
What are some alternatives?
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Docspell - Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
darktable - darktable is an open source photography workflow application and raw developer
tesserocr - A Python wrapper for the tesseract-ocr API
scantopl - Automatically upload file to paperless when filename match a prefix
Portainer - Making Docker and Kubernetes management easy.
invoice2data - Extract structured data from PDF invoices
audiobookshelf - Self-hosted audiobook and podcast server
pdfminer.six - Community maintained fork of pdfminer - we fathom PDF