pubs
OCRmyPDF
Our great sponsors
pubs | OCRmyPDF | |
---|---|---|
2 | 77 | |
257 | 11,866 | |
0.8% | 3.8% | |
3.0 | 9.6 | |
8 months ago | 10 days ago | |
Python | Python | |
GNU Lesser General Public License v3.0 only | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pubs
- Minimalist way of managing academic papers?
-
Terminal bibliography manager based on BibTeX
May I suggest adding an "alternatives" section to the README? You should mention at least pubs and papis.
OCRmyPDF
-
TextSnatcher: Copy text from images, for the Linux Desktop
Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.
- FLaNK Stack Weekly 19 Feb 2024
-
Calibre – New in Calibre 7.0
I recommend running any such PDFs through OCRmyPDF.
- Gibts ein (CLI) tool, das Kontrast und Helligkeit von gescannten Textdokumenten dynamisch anpasst?
- Donut: OCR-Free Document Understanding Transformer
-
massive crop and OCR newspaper
Use imagemagick to convert them to PDF and ocrmypdf to straighten and OCR. See this explanation.
-
OCRmyPDF VS PDF-Reader-PRO - a user suggested alternative
2 projects | 26 Apr 2023
- Looking for OCR program that can recognise old docs
-
Recommendations on OCR software?
I recently tried out a bunch of software and had the best success with ocrmypdf
-
Perfect note taking and information organizing solution - does it exist ?
I haven’t had that experience using OneDrive on my Mac. Genuinely it would slightly concern me if it modified files I put into it to make them searchable without telling me, or alternatively it’s gotta be maintaining a separate index which Spotlight would have no way of accessing. This tool may be helpful. it’s not something I’ve had a need for, so I haven’t tried it. Should work with Spotlight just fine.
What are some alternatives?
PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
tesserocr - A Python wrapper for the tesseract-ocr API
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
invoice2data - Extract structured data from PDF invoices
pdfminer.six - Community maintained fork of pdfminer - we fathom PDF
EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
flameshot - Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:
papis - Powerful and highly extensible command-line based document and bibliography manager.
macOCR - Get any text on your screen into your clipboard.
Mayan EDMS - Free Open Source Document Management System (mirror, no pull request or issues)
pyHanko - pyHanko: sign and stamp PDF files