Tesseract OCR

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • scantailor-advanced

    ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.

    I use a £15 arm with a vice grip for my phone from Amazon, copy the files to my laptop and then run a bash for-loop of the tesseract CLI over the resultant files.

    I use https://github.com/4lex4/scantailor-advanced to deskew the images and generate the PDF.

    It isn't perfect but my purposes are more around research than publication, so, YMMV!

  • OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

    I've used tesseract directly and there definitely is some footguns when it comes to PDFs and being sure not to re-compress them and lose quality.

    If you're looking to add a text layer to a PDF (for search purposes for instance) I can highly recommend https://github.com/jbarlow83/OCRmyPDF/

    It uses Tesseract and works quite well for most PDFs, I made a semi-functional script before I discovered it and it would have saved a lot of hassle.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • Mayan EDMS

    Free Open Source Document Management System (mirror, no pull request or issues)

    This is the OCR engine used by Mayan EDMS[1] which I've used since 2018. The reliability has been topnotch.

    [1] https://www.mayan-edms.com/

  • local_adaptive_binarization

    Local adaptive image binarization

  • PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • BoofCV

    Fast computer vision library for SFM, calibration, fiducials, tracking, image processing, and more.

    Image processing strongly depends on what image you wanna use. To find an "auto" approach, that works for every image is nearly impossible...

    I once wrote a bookscanner app in Java (https://boofcv.org), where everything was done automatically (preprocessing, object detection / book extraction, skin detection / finger removal, deskewing, line-slope-correction and so on). It was very difficult to adjust the parameters, that at least most of the books looked good.

  • Tesseract.js

    Pure Javascript OCR for more than 100 Languages 📖🎉🖥

    I used the wasm implementation and scanned 1 cereal box label. https://tesseract.projectnaptha.com/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts