Top 14 Jupyter Notebook OCR Projects

deep-text-recognition-benchmark

2 3,613 0.0 Jupyter Notebook

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Pix2Text

6 1,277 9.2 Jupyter Notebook

Pix In, Latex & Text Out. Recognize Chinese, English Texts, and Math Formulas from Images. 80+ languages are supported.

Project mention: How do I solve this? | /r/LaTeX | 2023-06-11

Use this: https://p2t.behye.com/
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
tarsier

2 478 9.2 Jupyter Notebook

Vision utilities for web interaction agents 👀

Project mention: Control the browser using GPT-4 vision by AgentGPT team | news.ycombinator.com | 2023-11-12
PyMuPDF-Utilities

1 463 8.4 Jupyter Notebook

Demos, examples and utilities using PyMuPDF

Project mention: Anybody has code for a gui app to extract images from several pdfs at once? | /r/Python | 2023-05-13
deep-text-recognition-benchmark

1 275 2.2 Jupyter Notebook

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR) (by roatienza)
Multi-Type-TD-TSR

4 236 0.0 Jupyter Notebook

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
ocrpy

6 218 0.0 Jupyter Notebook

OCR, Archive, Index and Search: Implementation agnostic OCR framework.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
document-ai-samples

5 181 9.0 Jupyter Notebook

Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud

Project mention: When Will the GenAI Bubble Burst? | news.ycombinator.com | 2024-04-04

Thanks for the example and that sounds really solid cost savings and definitely agree with the trend that it is here to stay.
For invoice parsing (various formats), are you just using GPT4V? When GPT4V initially came out, i benchmarked it against an out of the box invoice parser from Google Cloud (https://cloud.google.com/document-ai) on 16 documents and it was much better accuracy wise. For ex: i'd get results parsing 10,100 as 101100 (no comma).
Curious if you saw problems like this in your pipeline or if its gotten much better since?
Calliar

1 136 2.2 Jupyter Notebook

A dataset for online Arabic calligraphy. A collection of 2500 annotated calligraphic styles.
videocr-PaddleOCR

3 106 4.2 Jupyter Notebook

Extract hardcoded subtitles from videos using machine learning
tutorials

2 79 1.6 Jupyter Notebook

Git Repo for Articles on Ergo Sum blog and the youtube channel https://www.youtube.com/channel/UCiie9CN--dazA7iT2sry5FA (by rogerfitz)
Easter2

1 73 0.0 Jupyter Notebook

Easter2.0: IMPROVING CONVOLUTIONAL MODELS FOR HANDWRITTEN TEXT RECOGNITION
konfuzio-sdk

2 52 9.4 Jupyter Notebook

OCR, extract and classify documents. In addition, annotate documents and build your own NLP and Computer Vision models using Python by downloading the data. Find examples in our Colab Notebooks, e. g. how to fine-tune Flair.
docutron

2 16 5.8 Jupyter Notebook

Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.

Project mention: Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents | /r/Python | 2023-10-24
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-04.

Jupyter Notebook OCR related posts

When Will the GenAI Bubble Burst?
1 project | news.ycombinator.com | 4 Apr 2024
Control the browser using GPT-4 vision by AgentGPT team
1 project | news.ycombinator.com | 12 Nov 2023
how to run this python program, total noob
2 projects | /r/learnpython | 6 Feb 2023
[R] Calliar: An Online Handwritten Dataset for Arabic Calligraphy
1 project | /r/MachineLearning | 22 Jun 2021

Index

What are some of the best open-source OCR projects in Jupyter Notebook? This list will help you:

	Project	Stars
1	deep-text-recognition-benchmark	3,613
2	Pix2Text	1,277
3	tarsier	478
4	PyMuPDF-Utilities	463
5	deep-text-recognition-benchmark	275
6	Multi-Type-TD-TSR	236
7	ocrpy	218
8	document-ai-samples	181
9	Calliar	136
10	videocr-PaddleOCR	106
11	tutorials	79
12	Easter2	73
13	konfuzio-sdk	52
14	docutron	16