Tesseract.js – Pure JavaScript OCR

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Appwrite - The Open Source Firebase alternative introduces iOS support
  • Scout APM - Less time debugging, more time building
  • SonarLint - Clean code begins in your IDE with SonarLint
  • Tesseract.js

    Pure Javascript OCR for more than 100 Languages 📖🎉🖥

  • cluttr

    I made a utility that cleans up your Mac desktop and uses Tesseract to extract text from screenshots. This makes it really easy to find screenshots by searching for a line of text you remember.

    https://gitlab.com/bearjaws/cluttr#readme

  • Appwrite

    Appwrite - The Open Source Firebase alternative introduces iOS support . Appwrite is an open source backend server that helps you build native iOS applications much faster with realtime APIs for authentication, databases, files storage, cloud functions and much more!

  • binaryen

    Compiler infrastructure and toolchain library for WebAssembly

    Emscripten can target both WebAssembly and JavaScript. The JavaScript option uses wasm2js - it compiles first to wasm, then compiles that to JS.

    https://github.com/WebAssembly/binaryen#wasm2js

    The emcc flag -sWASM=0 disables the wasm final output and emits JS instead.

  • tesseract.js-core

    Emscripten port of Tesseract C++ API

    It's annoying to find out the actual code that does the OCR is not in this repo after looking through the entire thing. It's just a bunch of scheduling and worker logic and for some reason the JS is written twice once for the browser and once for Node.

    The actual code that does the OCR is wraped and included via this package [0] which just wraps the original Tesseract in C++ [1] using wasm. Shameful title.

    [0] https://github.com/naptha/tesseract.js-core

    [1] https://github.com/jeromewu/tesseract

  • tesseract

    Tesseract Open Source OCR Engine (main repository) (by jeromewu)

    It's annoying to find out the actual code that does the OCR is not in this repo after looking through the entire thing. It's just a bunch of scheduling and worker logic and for some reason the JS is written twice once for the browser and once for Node.

    The actual code that does the OCR is wraped and included via this package [0] which just wraps the original Tesseract in C++ [1] using wasm. Shameful title.

    [0] https://github.com/naptha/tesseract.js-core

    [1] https://github.com/jeromewu/tesseract

  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    I've had good results with EasyOCR, much better than Tesseract. I agree with you, Tesseract has performed very poorly in my experience.

    https://github.com/JaidedAI/EasyOCR

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts