SaaSHub helps you find the best software and product alternatives Learn more →
Top 17 Python Tesseract Projects
-
Project mention: 13 GitHub Projects that Supercharge Your AI and Development Journey 🚀 | dev.to | 2025-03-03
Stars: 19899 Author: ocrmypdf Star the OCRmyPDF repository⭐
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Project mention: Ask HN: What are you using to parse PDFs for RAG? | news.ycombinator.com | 2024-07-30 -
-
llm_aided_ocr
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
I think the current sweet-spot for speed/efficiency/accuracy is to use Tesseract in combination with an LLM to fix any errors and to improve formatting, as in my open source project which has been shared before as a Show HN:
https://github.com/Dicklesworthstone/llm_aided_ocr
This process also makes it extremely easy to tweak/customize simply by editing the English language prompt texts to prioritize aspects specific to your set of input documents.
-
-
-
-
Nutrient
Nutrient – The #1 PDF SDK Library, trusted by 10K+ developers. Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries.
-
J.A.R.V.I.S
Personal Assistant built using python libraries. It does almost anything which includes sending emails, Optical Text Recognition, Dynamic News Reporting at any time with API integration, Todo list generator, Opens any website with just a voice command, Plays Music, Wikipedia searching, Dictionary with Intelligent Sensing i.e. auto spell checking, Weather Reporting i.e. temp, wind speed, humidity, YouTube searching, Google Map searching, Youtube Downloading, etc.
-
-
BetterOCR
🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.
-
GitHub Repository: FastMRZ Repo
-
-
Automatic-License-Plate-Recognition
Automatic License Plate Recognition is implemented using Python, OpenCV and Tesseract to recognize Indian license plates and store the data in a CSV file.
-
-
Project mention: Aug 7, 2024 - Developing Data-Centric Visual AI Apps Workshop | dev.to | 2024-08-07
From concept interpolation to image deduplication, optical character recognition, and even curating your own AI art gallery by adding generated images directly into a dataset, your imagination is the only limit. Join us to discover how you can unleash your creativity and interact with data like never before.
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Tesseract discussion
Python Tesseract related posts
-
Mistral OCR
-
OCR4all
-
Llama-OCR: An Open-Source Llama 3.2 Based OCR Tool
-
A return to hand-written notes by learning to read and write
-
Marker: Convert PDF to Markdown quickly with high accuracy
-
A better document viewer
-
OCR in-game text using Tesseract
-
A note from our sponsor - SaaSHub
www.saashub.com | 14 Mar 2025
Index
What are some of the best open-source Tesseract projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | OCRmyPDF | 20,520 |
2 | PyMuPDF | 6,659 |
3 | RPA-Python | 5,105 |
4 | llm_aided_ocr | 2,551 |
5 | tesserocr | 2,068 |
6 | textshot | 1,751 |
7 | lambda-packs | 1,117 |
8 | J.A.R.V.I.S | 884 |
9 | tesstrain | 659 |
10 | BetterOCR | 529 |
11 | fastmrz | 54 |
12 | Nkocr | 36 |
13 | Automatic-License-Plate-Recognition | 15 |
14 | hypercube-viewer | 13 |
15 | pytesseract-ocr-plugin | 10 |
16 | schlaumeier | 8 |
17 | koann | 2 |