Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
In Python the library PyTesseract constructs a command to run and calls Tesseract via the command-line as a subprocess, which is inefficient if you have more than one image to process, because it has to reinitialize the OCR engine for every image. tesserocr is a different library which came around a bit later, which is a direct binding to the Tesseract library, so you can initialise the engine once and process several images with it, and for images that are stored in memory (e.g. OpenCV arrays that you’ve done some processing on) you can process them directly instead of having to save them as individual files (which PyTesseract requires).
Related posts
- Tesserocr
- [Question] I am trying to segment the image using python.
- Python app that will take a picture, scan it and upload that information into a excel file.
- [Question] Working on a simple OCR program but the text from the image is returned in a backward order and it has trouble reading multiple words on a line
- Pytesseract/OCR: RuntimeError: can't start new thread when no multi-threading