Show HN: How do you OCR on a Mac using the CLI or just Python for free

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • doctr

    docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

  • Tesseract is widely known to be "meh" at this point.

    If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.

    I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.

    [0] - https://github.com/Unstructured-IO/unstructured-inference

    [1] - https://github.com/mindee/doctr

    [2] - https://github.com/mindee/doctr#models-architectures

    [3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

  • ocrmac

    A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.

  • Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database.

    I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.

    I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.

    [1]: https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...

    [2]: https://github.com/straussmaximilian/ocrmac

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • unstructured-inference

  • Tesseract is widely known to be "meh" at this point.

    If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.

    I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.

    [0] - https://github.com/Unstructured-IO/unstructured-inference

    [1] - https://github.com/mindee/doctr

    [2] - https://github.com/mindee/doctr#models-architectures

    [3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

  • aichat

    All-in-one AI-Powered CLI Chat & Copilot that integrates 10+ AI platforms, including OpenAI, Azure-OpenAI, Gemini, VertexAI, Claude, Mistral, Cohere, Ollama, Ernie, Qianwen...

  • use LLMs (gpt-4-vision or LLaVA) with aichat

    `aichat -f tmp/test.png -- output only text in the image`

    https://github.com/sigoden/aichat

  • Camelot

    A Python library to extract tabular data from PDFs

  • I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts