How easy would it be/how would I go about implementing automated OCR + word count estimate after a file upload for a translation website?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

PaddleOCR

60 38,373 8.6 Python

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Then you need some file handling to handle different file types. Text documents and spreadsheets don't need OCR. You can use any excel / word reader library to just parse the data and count the words. For pdfs and images, I would use PaddleOCR. It's free and works reasonably well. If you are only interested in words, do some postprocessing. Easy but not accurate would be checking if a string is not just punctuation, you could also map against a dictionary or use nlp.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

What is the best repo for hand written text recognition?
1 project | /r/computervision | 11 Dec 2023
Ask HN: Best way to perform complex OCR task in 2023?
1 project | news.ycombinator.com | 5 Dec 2023
How would you go about driving contextual data from images?
3 projects | /r/LangChain | 4 Jul 2023
Seeking Advice for Improving OCR Accuracy in a Code Snippet Reader Project
1 project | /r/computervision | 27 Jun 2023
unable to install paddleocr on m1 mac
1 project | /r/learnpython | 4 Jun 2023

How easy would it be/how would I go about implementing automated OCR + word count estimate after a file upload for a translation website?

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython
OCR crnn ocrlite Db chineseocr
Post date: 23 Feb 2022

PaddleOCR

WorkOS

Related posts

How easy would it be/how would I go about implementing automated OCR + word count estimate after a file upload for a translation website?

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython OCR crnn ocrlite Db chineseocr Post date: 23 Feb 2022

PaddleOCR

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython
OCR crnn ocrlite Db chineseocr
Post date: 23 Feb 2022