pubs vs OCRmyPDF

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pubs		OCRmyPDF
	Project
2	Mentions	77
257	Stars	11,866
0.8%	Growth	3.8%
3.0	Activity	9.6
8 months ago	Latest Commit	10 days ago
Python	Language	Python
GNU Lesser General Public License v3.0 only	License	Mozilla Public License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

pubs

Posts with mentions or reviews of pubs. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-10-05.

Minimalist way of managing academic papers?
3 projects | /r/commandline | 5 Oct 2022
Terminal bibliography manager based on BibTeX
2 projects | /r/LaTeX | 13 Apr 2021

May I suggest adding an "alternatives" section to the README? You should mention at least pubs and papis.

OCRmyPDF

Posts with mentions or reviews of OCRmyPDF. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-14.

TextSnatcher: Copy text from images, for the Linux Desktop
7 projects | news.ycombinator.com | 14 Mar 2024

Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.
FLaNK Stack Weekly 19 Feb 2024
50 projects | dev.to | 19 Feb 2024
Calibre – New in Calibre 7.0
11 projects | news.ycombinator.com | 18 Nov 2023

I recommend running any such PDFs through OCRmyPDF.
https://github.com/ocrmypdf/OCRmyPDF
Gibts ein (CLI) tool, das Kontrast und Helligkeit von gescannten Textdokumenten dynamisch anpasst?
3 projects | /r/de_EDV | 27 Jun 2023
Donut: OCR-Free Document Understanding Transformer
4 projects | news.ycombinator.com | 29 May 2023
massive crop and OCR newspaper
3 projects | /r/macapps | 16 May 2023

Use imagemagick to convert them to PDF and ocrmypdf to straighten and OCR. See this explanation.
OCRmyPDF VS PDF-Reader-PRO - a user suggested alternative
2 projects | 26 Apr 2023
Looking for OCR program that can recognise old docs
2 projects | /r/software | 6 Mar 2023
Recommendations on OCR software?
4 projects | /r/archlinux | 30 Nov 2022

I recently tried out a bunch of software and had the best success with ocrmypdf
Perfect note taking and information organizing solution - does it exist ?
2 projects | /r/apple | 22 Nov 2022

I haven’t had that experience using OneDrive on my Mac. Genuinely it would slightly concern me if it modified files I put into it to make them searchable without telling me, or alternatively it’s gotta be maintaining a separate index which Spotlight would have no way of accessing. This tool may be helpful. it’s not something I’ve had a need for, so I haven’t tried it. Should work with Spotlight just fine.

What are some alternatives?

When comparing pubs and OCRmyPDF you can also consider the following projects:

PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

tesserocr - A Python wrapper for the tesseract-ocr API

Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents

invoice2data - Extract structured data from PDF invoices

pdfminer.six - Community maintained fork of pdfminer - we fathom PDF

EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

flameshot - Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:

papis - Powerful and highly extensible command-line based document and bibliography manager.

macOCR - Get any text on your screen into your clipboard.

Mayan EDMS - Free Open Source Document Management System (mirror, no pull request or issues)

pyHanko - pyHanko: sign and stamp PDF files

OCRmyPDF vs PaddleOCR OCRmyPDF vs pdfplumber OCRmyPDF vs tesserocr OCRmyPDF vs Paperless-ng OCRmyPDF vs invoice2data OCRmyPDF vs pdfminer.six OCRmyPDF vs EasyOCR OCRmyPDF vs flameshot pubs vs papis OCRmyPDF vs macOCR OCRmyPDF vs Mayan EDMS OCRmyPDF vs pyHanko

Compare pubs vs OCRmyPDF and see what are their differences.

pubs

OCRmyPDF

pubs

OCRmyPDF

What are some alternatives?