tesstrain
OCRmyPDF
tesstrain | OCRmyPDF | |
---|---|---|
3 | 2 | |
575 | 6,031 | |
2.6% | - | |
7.6 | 9.3 | |
28 days ago | about 2 years ago | |
Python | Python | |
Apache License 2.0 | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tesstrain
-
OCR in-game text using Tesseract
you can train tesseract models as described here, but you would need to create a dataset first: https://github.com/tesseract-ocr/tesstrain
-
Help to improve pytesseract accuracy
You could retrain tesseract https://github.com/tesseract-ocr/tesstrain
-
Requesting help training Tesseract
Here is the official repository for training and finetuning Tesseract OCR. https://github.com/tesseract-ocr/tesstrain Readme is relatively simple to understand.
OCRmyPDF
What are some alternatives?
chord_trainer - Using DSP guitar chord recognition to gamify chord practice
PyPDF2 - A utility to read and write PDFs with Python [Moved to: https://github.com/py-pdf/PyPDF2]
tesserocr - A Python wrapper for the tesseract-ocr API
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Automatic-License-Plate-Recognition - Automatic License Plate Recognition is implemented using Python, OpenCV and Tesseract to recognize Indian license plates and store the data in a CSV file.
Nkocr - 🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.
google_drive_ocr - Perform OCR using Google's Drive API v3
Mayan EDMS - Free Open Source Document Management System (mirror, no pull request or issues)
OCR-PDF-Action - A GitHub action for turning scanned PDF's into searchable documents
EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
pdftabextract - A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.