paper-bidsheets
nougat
paper-bidsheets | nougat | |
---|---|---|
1 | 19 | |
7 | 9,351 | |
- | 1.9% | |
5.0 | 2.4 | |
4 months ago | about 1 month ago | |
Go | Python | |
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
paper-bidsheets
-
Llama-OCR: An Open-Source Llama 3.2 Based OCR Tool
I have recently used llama3.2-vision to handle some paper bidsheets for a charity auction and it is fairly accurate with some terrible handwriting. I hope to use it for my event next year.
I do find it rather annoying not being able to get it to consistently output a CSV though. ChatGPT and Gemini seem better at doing that but I haven’t tried to automate it.
The scale of my problem is about 100 pages of bidsheets and so some manual cleaning is ok. It is certainly better than burning volunteers time.
https://github.com/philips/paper-bidsheets
nougat
-
TIL: 3️⃣ ways I use Large Language Models to increase learning efficiency
Link: https://github.com/facebookresearch/nougat
-
Llama-OCR: An Open-Source Llama 3.2 Based OCR Tool
Looks awesome! Been doing a lot of OCR recently, and love the addition to the space. The reigning champion in the PDF -> Markdown space (AFAIK) is Facebook's Nougat[1], and I'm excited to hook this up to DSPy and see which works better for philosophy books. This repo links the Zerox[2] project by some startup, which also looks awesome, and certainly more smoothly advertised than Nougat. Would love corrections/advice from any actual experts passing by this comment section :)
That said, I have a few questions if OP/anyone knows the answers:
1. What is Together.ai, and is this model OSS? Their website sells them as a hosting service, and the "Custom Models" page[3] seems to be about custom finetuning, not, like, training new proprietary models in-house. They might have a HuggingFace profile but it's hard to tell if it's them https://huggingface.co/TogetherAI
2. The GitHub says "hosted demo", but the hosting part is just the tiny (clean!) WebGUI, yes? It's implied that this functionality is and will always be available only through API calls?
P.S. The header links are broken on my desktop browser -- no onClick triggered
[1] https://facebookresearch.github.io/nougat/
[2] https://github.com/getomni-ai/zerox
[3] https://www.together.ai/products#custom-models
-
Show HN: PDF to MD by LLMs – Extract Text/Tables/Image Descriptives by GPT4o
Does it work well on documents that aren't academic papers?
https://facebookresearch.github.io/nougat/
-
Open-source tool helps you convert PDF documents, web pages, etc., into Markdown
Anyone know how this compares to GROBID [1]? I'm looking at alternatives to GROBID as I'm not super pleased with its outputs. GROBID has a lot of great features for journal papers (reference extraction / parsing), but I'm only interested in cleanly extracting the body. Also considering nougat [2] but I haven't tried it yet.
[1] https://github.com/kermitt2/grobid
[2] https://github.com/facebookresearch/nougat
- Nougat – Pdf to Markdown
- Ask HN: What are you using to parse PDFs for RAG?
-
Show HN: Talk to any ArXiv paper just by changing the URL
https://github.com/facebookresearch/nougat/tree/main
- FLaNK Stack for 04 December 2023
- Detexify LaTeX Handwriting Symbol Recognition
-
Pix2tex: Using a ViT to convert images of equations into LaTeX code
If you're looking for more e2e math / latex aware OCR checkout https://github.com/facebookresearch/nougat
What are some alternatives?
llama-ocr - Document to Markdown OCR library with Llama 3.2 vision
marker - Convert PDF to markdown + JSON quickly with high accuracy
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
MonkeyDev - CaptainHook Tweak、Logos Tweak and Command-line Tool、Patch iOS Apps, Without Jailbreak.
wordninja - Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.
LIMoE-pytorch - PyTorch implementation of LIMoE