tesserocr VS pytesseract

Compare tesserocr vs pytesseract and see what are their differences.

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
tesserocr pytesseract
17 11
1,927 5,495
- -
5.9 7.7
22 days ago 3 days ago
Python Python
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

tesserocr

Posts with mentions or reviews of tesserocr. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-05-08.
  • Tesserocr
    1 project | /r/pycharm | 25 Jan 2023
    Did you read the instructions for windows? https://github.com/sirfz/tesserocr
  • [Question] I am trying to segment the image using python.
    1 project | /r/opencv | 9 Aug 2021
    If you’re using tesserocr then you can use OpenCV images directly, so you can just extract the relevant image rows (e.g. query_image = main_image[prev_line:this_line]) and process then without needing to save each image.
  • Python app that will take a picture, scan it and upload that information into a excel file.
    1 project | /r/learnpython | 29 Jul 2021
    This tutorial is a good start towards getting the data from an image of a form with a known structure. I’d personally recommend using tesserocr (actual library binding, more efficient, more functionality) instead of pytesseract (requires images to be saved before processing, uses command-line options in a subprocess instead of binding to the library), but both should work (that tutorial uses pytesseract, which is also what u/Iceberg_Bart_Simpson linked to).
  • [Question] Working on a simple OCR program but the text from the image is returned in a backward order and it has trouble reading multiple words on a line
    1 project | /r/opencv | 11 May 2021
    Side note, but I’d suggest using tesserocr over pytesseract. It’s an actual binding to the tesseract library, so comes with numerous efficiency and interface benefits, and can operate on OpenCV images directly (whereas pytesseract saves them to disk first).
  • Optimizing ImageGrab and pytesseract
    3 projects | /r/learnpython | 8 May 2021
    If you’re after speed I’d recommend mss for screenshots/recording, and tesserocr instead of pytesseract (note in particular the OpenCV option.
  • Is pytesseract the only option for OCR in python?
    2 projects | /r/Python | 2 May 2021
    tesserocr is an actual binding to the tesseract library, and is better in practically every way than pytesseract (more efficient, more options for usage, doesn’t require saving images to disk before they can be processed, and more).
  • OCR with Python
    2 projects | /r/learnpython | 15 Apr 2021
    If you have an electronically created pdf (not scanned) and you’re just wanting to run OCR on embedded images then you’ll want a pdf library that can extract the figure images for you, and then you can use tesserocr to run OCR on those images.
  • Pytesseract/OCR: RuntimeError: can't start new thread when no multi-threading
    1 project | /r/learnpython | 24 Mar 2021
    If you want a suggestion, use tesserocr instead of Pytesseract. It’s an actual binding to the tesseract library (Python talks to it directly, instead of calling a program as a subprocess), which means it runs more efficiently, you can process multiple images sequentially with the same OCR engine (pytesseract has to start a process and a new engine for every image that gets processed), you get access to more functionality options, and a bunch of other beneficial stuff. If you’re doing preprocessing with OpenCV it’s even possible to pass those arrays directly to tesseract in memory, whereas Pytesseract requires that you save each image to a file before it can process it.
  • Can´t get part of this REGEX-pattern to work?
    1 project | /r/learnpython | 17 Mar 2021
    As a somewhat unrelated side note, I’d strongly suggest using tesserocr instead of pytesseract, and even more so if you’re working with opencv as well. It’s a true library binding which means it’s more efficient, you have more functionality available to you, you can process multiple images with the same Tesseract engine, and you can process opencv images directly (compared to pytesseract which saves them as a file first and then calls the tesseract CLI as a subprocess).
  • OCR Video Game Text
    1 project | /r/learnpython | 6 Mar 2021
    In Python the library PyTesseract constructs a command to run and calls Tesseract via the command-line as a subprocess, which is inefficient if you have more than one image to process, because it has to reinitialize the OCR engine for every image. tesserocr is a different library which came around a bit later, which is a direct binding to the Tesseract library, so you can initialise the engine once and process several images with it, and for images that are stored in memory (e.g. OpenCV arrays that you’ve done some processing on) you can process them directly instead of having to save them as individual files (which PyTesseract requires).

pytesseract

Posts with mentions or reviews of pytesseract. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-05.
  • What's the BEST way to detect these letters on an image?
    2 projects | /r/learnpython | 5 Mar 2023
    If you don't have it already: https://github.com/madmaze/pytesseract
  • API Python pour récupérer ses données quotidiennes de compte Credit Mutuel ?
    1 project | /r/vosfinances | 12 Oct 2022
  • pytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')
    1 project | /r/learnpython | 11 Sep 2022
    Yes, pytesseract is a wrapper script and all heavy lifting is done by Tesseract. See the README.
  • ....
    2 projects | /r/terriblefacebookmemes | 29 Jun 2022
    As far as working with reading text from a image there are lots of different libraries for doing this sort of thing, but one of the biggest is probably pytesseract. It is extremely powerful for image to text, and reliably beats alphabet soup captchas.
  • Extract Highlighted Text from a Book using Python
    4 projects | dev.to | 22 Feb 2022
    I'm going to use the Tesseract OCR engine and library, and its Python wrapper PyTesseract for text extraction. But there are numerous libraries out there to extract text from an image. In a real world application I would probably use cloud services from AWS, Google or Microsoft to handle this task.
  • A bot that copies a 15 digit number from a picture and renames the picture by that number
    1 project | /r/learnpython | 19 Jan 2022
    There's Python Tesseract to do the OCR from python. I think this is not really a beginner's project. Not too much programming, but you need to be able to install the required libraries and glue everything together. If you don't know how to do that maybe start with something simpler.
  • text recognition code
    1 project | /r/learnpython | 27 Dec 2021
    From what I have heard, tesseract is the best python module for OCR
  • exporting handwritten dataset as text, export it and use it as a csv
    3 projects | /r/RemarkableTablet | 16 Sep 2021
    Yeah, I’m pretty sure the Remarkable OCR is not up to these kinds of tasks unfortunately. If you know some coding you could write something that’d likely work well in Python using for ex. this for receiving the mail attachment and this for converting the PDF to CSV. This is in case you’d write your data as a table on the Remarkable, which I guess is preferable to writing something like (0.5, 8.4, -0.3). If you’d rather do it that way, there are other more suitable OCR tools like this one. The checkbox use-case in the comment above would also be possible by modifying this approach. DM if you’d like to discuss further work.
  • Top 5 Python libraries for Computer vision
    8 projects | dev.to | 6 May 2021
    pytesseract - Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded in images. Python-tesseract is a wrapper for Google's Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others. Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file.
  • Using Google's OCR API with Puppeteer for Visual Testing
    1 project | dev.to | 8 Feb 2021
    There are multiple open-source OCR tools like pytesseract or EasyOCR, which can be used to integrate OCR functionality into a program. However, these tools require significant configurations to get up and running to provide results with an acceptable accuracy level.

What are some alternatives?

When comparing tesserocr and pytesseract you can also consider the following projects:

doctr - docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

pyocr

EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

tesseract-ocr - Tesseract Open Source OCR Engine (main repository)

OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Signalum - To explore creating an application that detects available connections at once from wifi and bluetooth

OpenCV - Open Source Computer Vision Library

normcap - OCR powered screen-capture tool to capture information instead of images

Face Recognition - The world's simplest facial recognition api for Python and the command line

Camelot - A Python library to extract tabular data from PDFs

Kornia - Geometric Computer Vision Library for Spatial AI

Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration