OCRmyPDF
tesstrain
OCRmyPDF | tesstrain | |
---|---|---|
2 | 3 | |
6,031 | 569 | |
- | 1.6% | |
9.3 | 7.6 | |
about 2 years ago | 11 days ago | |
Python | Python | |
Mozilla Public License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OCRmyPDF
tesstrain
-
OCR in-game text using Tesseract
you can train tesseract models as described here, but you would need to create a dataset first: https://github.com/tesseract-ocr/tesstrain
-
Help to improve pytesseract accuracy
You could retrain tesseract https://github.com/tesseract-ocr/tesstrain
-
Requesting help training Tesseract
Here is the official repository for training and finetuning Tesseract OCR. https://github.com/tesseract-ocr/tesstrain Readme is relatively simple to understand.
What are some alternatives?
PyPDF2 - A utility to read and write PDFs with Python [Moved to: https://github.com/py-pdf/PyPDF2]
chord_trainer - Using DSP guitar chord recognition to gamify chord practice
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
tesserocr - A Python wrapper for the tesseract-ocr API
Nkocr - 🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.
Automatic-License-Plate-Recognition - Automatic License Plate Recognition is implemented using Python, OpenCV and Tesseract to recognize Indian license plates and store the data in a CSV file.
google_drive_ocr - Perform OCR using Google's Drive API v3
Mayan EDMS - Free Open Source Document Management System (mirror, no pull request or issues)
OCR-PDF-Action - A GitHub action for turning scanned PDF's into searchable documents
EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
pdftabextract - A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.