Show HN: How do you OCR on a Mac using the CLI or just Python for free

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

doctr

12 3,038 8.9 Python

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Tesseract is widely known to be "meh" at this point.
If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.
I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.
[0] - https://github.com/Unstructured-IO/unstructured-inference
[1] - https://github.com/mindee/doctr
[2] - https://github.com/mindee/doctr#models-architectures
[3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

ocrmac

1 126 6.8 Jupyter Notebook

A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.

Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database.
I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.
I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.
[1]: https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...
[2]: https://github.com/straussmaximilian/ocrmac

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
unstructured-inference

1 103 8.8 Python

Tesseract is widely known to be "meh" at this point.
If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.
I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.
[0] - https://github.com/Unstructured-IO/unstructured-inference
[1] - https://github.com/mindee/doctr
[2] - https://github.com/mindee/doctr#models-architectures
[3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

aichat

16 2,804 9.6 Rust

All-in-one AI-Powered CLI Chat & Copilot that integrates 10+ AI platforms, including OpenAI, Azure-OpenAI, Gemini, VertexAI, Claude, Mistral, Cohere, Ollama, Ernie, Qianwen...

use LLMs (gpt-4-vision or LLaVA) with aichat
`aichat -f tmp/test.png -- output only text in the image`
https://github.com/sigoden/aichat

Camelot

10 2,639 6.9 Python

A Python library to extract tabular data from PDFs

I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

LlamaCloud and LlamaParse
9 projects | news.ycombinator.com | 20 Feb 2024
Pdfsandwich
6 projects | news.ycombinator.com | 6 Nov 2021
exporting handwritten dataset as text, export it and use it as a csv
3 projects | /r/RemarkableTablet | 16 Sep 2021
Show HN: Cognita – open-source RAG framework for modular applications
3 projects | news.ycombinator.com | 27 Apr 2024
Machine Learning and AI Beyond the Basics Book
1 project | news.ycombinator.com | 16 Apr 2024

Show HN: How do you OCR on a Mac using the CLI or just Python for free

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Specific Formats Processing OCR AI PDF Deep Learning
Post date: 2 Jan 2024

doctr

ocrmac

InfluxDB

unstructured-inference

aichat

Camelot

WorkOS

Related posts

Show HN: How do you OCR on a Mac using the CLI or just Python for free

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Specific Formats Processing OCR AI PDF Deep Learning Post date: 2 Jan 2024

doctr

ocrmac

InfluxDB

unstructured-inference

aichat

Camelot

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Specific Formats Processing OCR AI PDF Deep Learning
Post date: 2 Jan 2024