Python Tesseract

Open-source Python projects categorized as Tesseract

Top 17 Python Tesseract Projects

  1. OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

    Project mention: 13 GitHub Projects that Supercharge Your AI and Development Journey 🚀 | dev.to | 2025-03-03

    Stars: 19899 Author: ocrmypdf Star the OCRmyPDF repository⭐

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. PyMuPDF

    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

    Project mention: Ask HN: What are you using to parse PDFs for RAG? | news.ycombinator.com | 2024-07-30
  4. RPA-Python

    Python package for doing RPA

  5. llm_aided_ocr

    Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

    Project mention: OCR4all | news.ycombinator.com | 2025-02-13

    I think the current sweet-spot for speed/efficiency/accuracy is to use Tesseract in combination with an LLM to fix any errors and to improve formatting, as in my open source project which has been shared before as a Show HN:

    https://github.com/Dicklesworthstone/llm_aided_ocr

    This process also makes it extremely easy to tweak/customize simply by editing the English language prompt texts to prioritize aspects specific to your set of input documents.

  6. tesserocr

    A Python wrapper for the tesseract-ocr API

  7. textshot

    Python tool for grabbing text via screenshot

  8. lambda-packs

    Precompiled packages for AWS Lambda

  9. Nutrient

    Nutrient – The #1 PDF SDK Library, trusted by 10K+ developers. Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries.

    Nutrient logo
  10. J.A.R.V.I.S

    Personal Assistant built using python libraries. It does almost anything which includes sending emails, Optical Text Recognition, Dynamic News Reporting at any time with API integration, Todo list generator, Opens any website with just a voice command, Plays Music, Wikipedia searching, Dictionary with Intelligent Sensing i.e. auto spell checking, Weather Reporting i.e. temp, wind speed, humidity, YouTube searching, Google Map searching, Youtube Downloading, etc.

  11. tesstrain

    Train Tesseract LSTM with make

  12. BetterOCR

    🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.

  13. fastmrz

    ⚡Extracting the Machine Readable Zone (MRZ) from passport or any document images

    Project mention: Introducing FastMRZ – Effortless MRZ Extraction Made Simple | dev.to | 2024-12-31

    GitHub Repository: FastMRZ Repo

  14. Nkocr

    🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.

  15. Automatic-License-Plate-Recognition

    Automatic License Plate Recognition is implemented using Python, OpenCV and Tesseract to recognize Indian license plates and store the data in a CSV file.

  16. hypercube-viewer

    Hypercube Viewer is a program that draws a hypercube of 3 to 10 dimensions.

  17. pytesseract-ocr-plugin

    Run optical character recognition with PyTesseract from the FiftyOne App!

    Project mention: Aug 7, 2024 - Developing Data-Centric Visual AI Apps Workshop | dev.to | 2024-08-07

    From concept interpolation to image deduplication, optical character recognition, and even curating your own AI art gallery by adding generated images directly into a dataset, your imagination is the only limit. Join us to discover how you can unleash your creativity and interact with data like never before.

  18. schlaumeier

    Automatically solve Android quiz games using OpenCV & ChatGPT🧙‍♂️

  19. koann

    OCR algorithm implementation - storing data locally or in MongoDB!

  20. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Tesseract discussion

Log in or Post with

Python Tesseract related posts

Index

What are some of the best open-source Tesseract projects in Python? This list will help you:

# Project Stars
1 OCRmyPDF 20,520
2 PyMuPDF 6,659
3 RPA-Python 5,105
4 llm_aided_ocr 2,551
5 tesserocr 2,068
6 textshot 1,751
7 lambda-packs 1,117
8 J.A.R.V.I.S 884
9 tesstrain 659
10 BetterOCR 529
11 fastmrz 54
12 Nkocr 36
13 Automatic-License-Plate-Recognition 15
14 hypercube-viewer 13
15 pytesseract-ocr-plugin 10
16 schlaumeier 8
17 koann 2

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?