Tesseract.js – Pure JavaScript OCR

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Appwrite - The open-source backend cloud platform
  • Onboard AI - Learn any GitHub repo in 59 seconds
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • Tesseract.js

    Pure Javascript OCR for more than 100 Languages 📖🎉🖥

  • cluttr

    I made a utility that cleans up your Mac desktop and uses Tesseract to extract text from screenshots. This makes it really easy to find screenshots by searching for a line of text you remember.

    https://gitlab.com/bearjaws/cluttr#readme

  • Appwrite

    Appwrite - The open-source backend cloud platform. Add Auth, Databases, Functions, and Storage to your product and build any application at any scale while using your preferred coding languages and tools.

  • binaryen

    Optimizer and compiler/toolchain library for WebAssembly

    Emscripten can target both WebAssembly and JavaScript. The JavaScript option uses wasm2js - it compiles first to wasm, then compiles that to JS.

    https://github.com/WebAssembly/binaryen#wasm2js

    The emcc flag -sWASM=0 disables the wasm final output and emits JS instead.

  • tesseract.js-core

    Emscripten port of Tesseract C++ API

    It's annoying to find out the actual code that does the OCR is not in this repo after looking through the entire thing. It's just a bunch of scheduling and worker logic and for some reason the JS is written twice once for the browser and once for Node.

    The actual code that does the OCR is wraped and included via this package [0] which just wraps the original Tesseract in C++ [1] using wasm. Shameful title.

    [0] https://github.com/naptha/tesseract.js-core

    [1] https://github.com/jeromewu/tesseract

  • tesseract

    Tesseract Open Source OCR Engine (main repository) (by jeromewu)

    It's annoying to find out the actual code that does the OCR is not in this repo after looking through the entire thing. It's just a bunch of scheduling and worker logic and for some reason the JS is written twice once for the browser and once for Node.

    The actual code that does the OCR is wraped and included via this package [0] which just wraps the original Tesseract in C++ [1] using wasm. Shameful title.

    [0] https://github.com/naptha/tesseract.js-core

    [1] https://github.com/jeromewu/tesseract

  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    I've had good results with EasyOCR, much better than Tesseract. I agree with you, Tesseract has performed very poorly in my experience.

    https://github.com/JaidedAI/EasyOCR

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts