Top 23 optical-character-recognition Open-Source Projects

EasyOCR

38 21,953 3.6 Python

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | dev.to | 2023-12-27

PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]

paperless-ngx

212 16,882 9.9 Python

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

Project mention: I accidentally built a meme search engine | news.ycombinator.com | 2024-04-13

I steered a friend towards Paperless (and away from an LLM solution) as a way of searching/accessing GBs of architectural PDFs recently - so far, it’s apparently working well for them.
https://github.com/paperless-ngx/paperless-ngx

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
SwiftOCR

0 4,579 1.8 Swift

Fast and simple OCR library written in Swift
doctr

12 3,038 8.9 Python

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Project mention: Show HN: How do you OCR on a Mac using the CLI or just Python for free | news.ycombinator.com | 2024-01-02

https://github.com/mindee/doctr/issues/1049
I am looking for something this polished and reliable for handwriting, does anyone have any pointers? I want to integrate it in a workflow with my eink tablet I take notes on. A few years ago, I tried various models, but they performed poorly (around 80% accuracy) on my handwriting, which I can read almost 90% of the time.

tesserocr

17 1,930 7.1 Python

A Python wrapper for the tesseract-ocr API
J.A.R.V.I.S

2 786 5.5 Python

Personal Assistant built using python libraries. It does almost anything which includes sending emails, Optical Text Recognition, Dynamic News Reporting at any time with API integration, Todo list generator, Opens any website with just a voice command, Plays Music, Wikipedia searching, Dictionary with Intelligent Sensing i.e. auto spell checking, Weather Reporting i.e. temp, wind speed, humidity, YouTube searching, Google Map searching, Youtube Downloading, etc.

Project mention: 🔥 600+ 🌟 and 140+ Forks to J.A.R.V.I.S 🚀, Added Dynamic Face Recognition to J.A.R.V.I.S 🤖 | dev.to | 2023-05-14

[GitHub Code](https://github.com/GauravSingh9356/J.A.R.V.I.S

Tesseract4Android

1 651 6.7 C

Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
kraken

2 643 9.1 Python

OCR engine for all the languages (by mittagessen)
react-native-tesseract-ocr

1 547 0.0 Java

Tesseract OCR wrapper for React Native
parseq

1 500 6.7 Python

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Project mention: need help for license plate number segmentation | /r/deeplearning | 2023-05-31

I really recommend the usage of scene text recognition models. They are perfect for these type of usecases: https://github.com/baudm/parseq or check https://paperswithcode.com/task/scene-text-recognition make sure to check the licenses and good luck 👍🏻

signature_extractor

1 426 0.0 Python

A super lightweight image processing algorithm for detection and extraction of overlapped handwritten signatures on scanned documents using OpenCV and scikit-image.

Project mention: im trying to automate my simpleton job | /r/learnpython | 2023-05-03

It's active field of research. e.g. here https://github.com/amaljoseph/Signature-Verification_System_using_YOLOv5-and-CycleGAN or here https://github.com/ahmetozlu/signature_extractor

edenai-apis

13 360 9.8 Python

Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

Project mention: We're Building an Open-Source LLM/AI API Wrapper: Here's Why | news.ycombinator.com | 2023-08-28

HackerNoon featured our latest article in the "Future of AI" category
We explain how Eden AI contributes to the AI ecosystem in structuring AI and LLM APIs by creating the most accomplished Open-Source wrapper possible.
You can support us in reaching 1000 stars on Github here: https://github.com/edenai/edenai-apis

OS-Bot-COLOR

3 229 7.4 Python

A lightweight desktop client & toolkit for writing, controlling and monitoring color-based automation scripts.
ssocr

1 193 6.1 C

Seven Segment Optical Character Recognition
handprint

2 157 0.0 Python

Apply different text recognition services to images of handwritten documents.
Orchestra

1 96 3.4 Python

Orchestra is a sheet music reader (optical music recognition (OMR) system) that converts sheet music to a machine-readable version.
formkiq-core

50 91 6.6 Java

A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. 🌟 Star to support our work!

Project mention: A Clutter-Free Life: Going Paperless with Paperless-Ngx | news.ycombinator.com | 2023-10-07

We may want to get in touch with each other. We have an Open Core document management platform that runs in AWS; I'm not sure about your roadmap, but there may be something there that's of use: https://github.com/formkiq/formkiq-core

Easter2

1 73 0.0 Jupyter Notebook

Easter2.0: IMPROVING CONVOLUTIONAL MODELS FOR HANDWRITTEN TEXT RECOGNITION
DocumentLab

1 69 10.0 C#

OCR using tesseract, ImageMagick, EmguCV, an advanced query language and a fluent query interface for C#

Project mention: I want to make a small scripting language for my graduation project. | /r/compsci | 2023-06-26

I've done a few languages in my time, here's a simple one that translates C like syntax into selenium operations: https://github.com/karisigurd4/SeleniumScript And a more novel query language for ocr'd document data: https://github.com/karisigurd4/DocumentLab

image-to-sound-python-

1 55 0.0 Python

A python project for converting an Image into audible sound using OCR and speech synthesis
EasyOCR-cpp

1 27 7.3 C++

Custom C++ implementation of deep learning based OCR

Project mention: [P] EasyOCR in C++! | /r/MachineLearning | 2023-12-02

OCR-PDF-Action

3 11 0.0 Shell

A GitHub action for turning scanned PDF's into searchable documents
Typewriter-OCR-TintypeText

5 10 0.0 Python

This typewriter OCR code can convert JPEG typewritten text images into RTF documents, while removing typos for you!
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

optical-character-recognition related posts

OCR at Edge on Cloudflare Constellation

3 projects | news.ycombinator.com | 3 Jul 2023
Tesserocr

1 project | /r/pycharm | 25 Jan 2023
New Eco-Friendly Indigo Typewriter Ink (Recipe Included!)

1 project | /r/typewriters | 30 Dec 2022
Digitalizing typewritten text

1 project | /r/typewriters | 5 Dec 2022
Python Testing 1

1 project | /r/Testing_MR_Bot | 9 Nov 2022
How to make Brilliant Blue FCF (blue food dye)-glycerine erasable typewriter ink

1 project | /r/typewriters | 6 May 2022
Make Your Own Gamebook

2 projects | /r/gamebooks | 8 Apr 2022
A note from our sponsor - InfluxDB
www.influxdata.com | 3 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source optical-character-recognition projects? This list will help you:

	Project	Stars
1	EasyOCR	21,953
2	paperless-ngx	16,882
3	SwiftOCR	4,579
4	doctr	3,038
5	tesserocr	1,930
6	J.A.R.V.I.S	786
7	Tesseract4Android	651
8	kraken	643
9	react-native-tesseract-ocr	547
10	parseq	500
11	signature_extractor	426
12	edenai-apis	360
13	OS-Bot-COLOR	229
14	ssocr	193
15	handprint	157
16	Orchestra	96
17	formkiq-core	91
18	Easter2	73
19	DocumentLab	69
20	image-to-sound-python-	55
21	EasyOCR-cpp	27
22	OCR-PDF-Action	11
23	Typewriter-OCR-TintypeText	10

optical-character-recognition

Top 23 optical-character-recognition Open-Source Projects

optical-character-recognition related posts

OCR at Edge on Cloudflare Constellation

Tesserocr

New Eco-Friendly Indigo Typewriter Ink (Recipe Included!)

Digitalizing typewritten text

Python Testing 1

How to make Brilliant Blue FCF (blue food dye)-glycerine erasable typewriter ink

Make Your Own Gamebook

Index