Made a script that transcripts and translates any PDF into a text file using tesseract

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

tesseract-ocr

120 58,022 8.9 C++

Tesseract Open Source OCR Engine (main repository)

This script uses the tesseract-ocr engine and some pip libraries. I've made it to be as user-friendly as I could and (theoretically) could translate from and to any language. It works with any PDF file, whether it is generated with any word proccessing software (MS Word, libreoffice writer...) or from a scanned document.

PDFtoTXT

1 6 0.0 Python

Converts any PDF file from one language into your language

PDFtoTXT.py

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

one of the Codia AI Design technologies: OCR Technology
1 project | dev.to | 14 Feb 2024
OCR text to speech for disability
1 project | /r/AskProgramming | 10 Dec 2023
How to Read Text From an Image with Python
1 project | dev.to | 23 Oct 2023
I used Node.js to OCR "Meme Monday" threads
1 project | dev.to | 5 Aug 2023
Is there any package for OCR automatic handwritten notes to text conversion available or on going?
1 project | /r/emacs | 29 May 2023

Made a script that transcripts and translates any PDF into a text file using tesseract

This page summarizes the projects mentioned and recommended in the original post on /r/Python
Image processing Tesseract tesseract-ocr OCR Lstm
Post date: 8 Feb 2022

tesseract-ocr

PDFtoTXT

InfluxDB

Related posts

Made a script that transcripts and translates any PDF into a text file using tesseract

This page summarizes the projects mentioned and recommended in the original post on /r/Python Image processing Tesseract tesseract-ocr OCR Lstm Post date: 8 Feb 2022

tesseract-ocr

PDFtoTXT

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/Python
Image processing Tesseract tesseract-ocr OCR Lstm
Post date: 8 Feb 2022