Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries. Learn more →
Nougat Alternatives
Similar projects and alternatives to nougat
-
-
Nutrient
Nutrient – The #1 PDF SDK Library, trusted by 10K+ developers. Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries.
-
-
-
-
-
FLiPStackWeekly
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
-
-
-
-
-
-
-
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
-
-
voyager
🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability. (by spotify)
-
document-ai-samples
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
nougat discussion
nougat reviews and mentions
-
Llama-OCR: An Open-Source Llama 3.2 Based OCR Tool
Looks awesome! Been doing a lot of OCR recently, and love the addition to the space. The reigning champion in the PDF -> Markdown space (AFAIK) is Facebook's Nougat[1], and I'm excited to hook this up to DSPy and see which works better for philosophy books. This repo links the Zerox[2] project by some startup, which also looks awesome, and certainly more smoothly advertised than Nougat. Would love corrections/advice from any actual experts passing by this comment section :)
That said, I have a few questions if OP/anyone knows the answers:
1. What is Together.ai, and is this model OSS? Their website sells them as a hosting service, and the "Custom Models" page[3] seems to be about custom finetuning, not, like, training new proprietary models in-house. They might have a HuggingFace profile but it's hard to tell if it's them https://huggingface.co/TogetherAI
2. The GitHub says "hosted demo", but the hosting part is just the tiny (clean!) WebGUI, yes? It's implied that this functionality is and will always be available only through API calls?
P.S. The header links are broken on my desktop browser -- no onClick triggered
[1] https://facebookresearch.github.io/nougat/
[2] https://github.com/getomni-ai/zerox
[3] https://www.together.ai/products#custom-models
-
Show HN: PDF to MD by LLMs – Extract Text/Tables/Image Descriptives by GPT4o
Does it work well on documents that aren't academic papers?
https://facebookresearch.github.io/nougat/
-
Open-source tool helps you convert PDF documents, web pages, etc., into Markdown
Anyone know how this compares to GROBID [1]? I'm looking at alternatives to GROBID as I'm not super pleased with its outputs. GROBID has a lot of great features for journal papers (reference extraction / parsing), but I'm only interested in cleanly extracting the body. Also considering nougat [2] but I haven't tried it yet.
[1] https://github.com/kermitt2/grobid
[2] https://github.com/facebookresearch/nougat
- Nougat – Pdf to Markdown
- Ask HN: What are you using to parse PDFs for RAG?
-
Show HN: Talk to any ArXiv paper just by changing the URL
https://github.com/facebookresearch/nougat/tree/main
- FLaNK Stack for 04 December 2023
- Detexify LaTeX Handwriting Symbol Recognition
-
Pix2tex: Using a ViT to convert images of equations into LaTeX code
If you're looking for more e2e math / latex aware OCR checkout https://github.com/facebookresearch/nougat
- Nougat: Open-source LaTeX aware OCR for math-heavy books
-
A note from our sponsor - Nutrient
www.nutrient.io | 13 Feb 2025
Stats
facebookresearch/nougat is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of nougat is Python.