Show HN: How do you OCR on a Mac using the CLI or just Python for free

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  1. doctr

    docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

    Tesseract is widely known to be "meh" at this point.

    If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.

    I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.

    [0] - https://github.com/Unstructured-IO/unstructured-inference

    [1] - https://github.com/mindee/doctr

    [2] - https://github.com/mindee/doctr#models-architectures

    [3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. ocrmac

    A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.

    Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database.

    I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.

    I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.

    [1]: https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...

    [2]: https://github.com/straussmaximilian/ocrmac

  4. unstructured-inference

    Tesseract is widely known to be "meh" at this point.

    If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.

    I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.

    [0] - https://github.com/Unstructured-IO/unstructured-inference

    [1] - https://github.com/mindee/doctr

    [2] - https://github.com/mindee/doctr#models-architectures

    [3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

  5. aichat

    All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.

    use LLMs (gpt-4-vision or LLaVA) with aichat

    `aichat -f tmp/test.png -- output only text in the image`

    https://github.com/sigoden/aichat

  6. Camelot

    A Python library to extract tabular data from PDFs

    I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)

  7. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Using Docling’s OCR features with RapidOCR

    9 projects | dev.to | 3 Apr 2025
  • Show HN: Kreuzberg v3.0 – Modern Python Document Extraction

    1 project | news.ycombinator.com | 24 Mar 2025
  • Interest in a pgvector-based RAG system library?

    1 project | news.ycombinator.com | 15 Mar 2025
  • Show HN: Documind – Open-source AI tool to turn documents into structured data

    12 projects | news.ycombinator.com | 18 Nov 2024
  • Decoding OCR: A Comprehensive Guide

    3 projects | dev.to | 7 Aug 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?