Llama-OCR: An Open-Source Llama 3.2 Based OCR Tool

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. zerox

    PDF to Markdown with vision models

    Looks awesome! Been doing a lot of OCR recently, and love the addition to the space. The reigning champion in the PDF -> Markdown space (AFAIK) is Facebook's Nougat[1], and I'm excited to hook this up to DSPy and see which works better for philosophy books. This repo links the Zerox[2] project by some startup, which also looks awesome, and certainly more smoothly advertised than Nougat. Would love corrections/advice from any actual experts passing by this comment section :)

    That said, I have a few questions if OP/anyone knows the answers:

    1. What is Together.ai, and is this model OSS? Their website sells them as a hosting service, and the "Custom Models" page[3] seems to be about custom finetuning, not, like, training new proprietary models in-house. They might have a HuggingFace profile but it's hard to tell if it's them https://huggingface.co/TogetherAI

    2. The GitHub says "hosted demo", but the hosting part is just the tiny (clean!) WebGUI, yes? It's implied that this functionality is and will always be available only through API calls?

    P.S. The header links are broken on my desktop browser -- no onClick triggered

    [1] https://facebookresearch.github.io/nougat/

    [2] https://github.com/getomni-ai/zerox

    [3] https://www.together.ai/products#custom-models

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. nougat

    Implementation of Nougat Neural Optical Understanding for Academic Documents

    Looks awesome! Been doing a lot of OCR recently, and love the addition to the space. The reigning champion in the PDF -> Markdown space (AFAIK) is Facebook's Nougat[1], and I'm excited to hook this up to DSPy and see which works better for philosophy books. This repo links the Zerox[2] project by some startup, which also looks awesome, and certainly more smoothly advertised than Nougat. Would love corrections/advice from any actual experts passing by this comment section :)

    That said, I have a few questions if OP/anyone knows the answers:

    1. What is Together.ai, and is this model OSS? Their website sells them as a hosting service, and the "Custom Models" page[3] seems to be about custom finetuning, not, like, training new proprietary models in-house. They might have a HuggingFace profile but it's hard to tell if it's them https://huggingface.co/TogetherAI

    2. The GitHub says "hosted demo", but the hosting part is just the tiny (clean!) WebGUI, yes? It's implied that this functionality is and will always be available only through API calls?

    P.S. The header links are broken on my desktop browser -- no onClick triggered

    [1] https://facebookresearch.github.io/nougat/

    [2] https://github.com/getomni-ai/zerox

    [3] https://www.together.ai/products#custom-models

  4. paper-bidsheets

    System to create paper auction bidsheets with Google Sheets and scan them using Ollama

    I have recently used llama3.2-vision to handle some paper bidsheets for a charity auction and it is fairly accurate with some terrible handwriting. I hope to use it for my event next year.

    I do find it rather annoying not being able to get it to consistently output a CSV though. ChatGPT and Gemini seem better at doing that but I haven’t tried to automate it.

    The scale of my problem is about 100 pages of bidsheets and so some manual cleaning is ok. It is certainly better than burning volunteers time.

    https://github.com/philips/paper-bidsheets

  5. wordninja

    Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

    WordNinja is pretty good as a post-processing step on wrongly split/concatenated words:

    [0]: https://github.com/keredson/wordninja

  6. OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

    While I'm a fan of Tika a lot of people get queasy from Java and XML, they might be better served by their preferred scripting language and https://github.com/ocrmypdf/OCRmyPDF, which has the same OCR engine.

  7. llama-ocr

    Document to Markdown OCR library with Llama 3.2 vision

    Here's the prompt being used, tweaking that might help: https://github.com/Nutlope/llama-ocr/blob/main/src/index.ts#...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • A return to hand-written notes by learning to read and write

    2 projects | news.ycombinator.com | 28 Oct 2024
  • A better document viewer

    1 project | /r/linux4noobs | 13 Sep 2023
  • OCR for a full pdf on Neoreader

    1 project | /r/Onyx_Boox | 25 Jun 2023
  • ELI5: why is PDF such a widespread text format, instead of a format that's actually easier to edit?

    1 project | /r/explainlikeimfive | 3 Jun 2023
  • [Free-Post Friday!] Recommendations for high volume document scanners

    1 project | /r/DataHoarder | 19 May 2023

Did you know that Python is
the 2nd most popular programming language
based on number of references?