nougat
PyMuPDF
nougat | PyMuPDF | |
---|---|---|
13 | 5 | |
8,155 | 4,157 | |
4.2% | 6.5% | |
7.5 | 9.8 | |
28 days ago | 3 days ago | |
Python | Python | |
MIT License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nougat
-
Show HN: Talk to any ArXiv paper just by changing the URL
https://github.com/facebookresearch/nougat/tree/main
- FLaNK Stack for 04 December 2023
- Detexify LaTeX Handwriting Symbol Recognition
-
Pix2tex: Using a ViT to convert images of equations into LaTeX code
If you're looking for more e2e math / latex aware OCR checkout https://github.com/facebookresearch/nougat
- Nougat: Open-source LaTeX aware OCR for math-heavy books
-
Did anyone manage to get nougat running?
git clone --recurse-submodules https://github.com/facebookresearch/nougat.git PyProject
- Nougat: Facebook Research PDF to .mdd Model
-
Linear Book Scanner – The open-source automatic book scanner
> For the scientific literature, we need a ChatGPT equivalent to reconstruct LaTeX source that can reproduce each page. (We really need a successor to LaTeX that isn't such an arcane language, and can author fixed and flowable text with equal ease.)
Check out Nougat: OCRing scientific papers with a deep net trained end to end. It was released by Meta a few days ago.
“PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents.”
https://facebookresearch.github.io/nougat/
-
Nougat: Neural Optical Understanding for Academic Documents
The paper (and examples) as HTML: https://facebookresearch.github.io/nougat/
Repo with code, including a CLI tool for converting a PDF to Mathpix Markdown: https://github.com/facebookresearch/nougat
PyMuPDF
- FLaNK Stack for 04 December 2023
-
Converting markdown to pdf in Python
This method is based on the use of the libraries markdown-it-py (conversion from markdown to html) and [PyMuPDF] https://github.com/pymupdf/PyMuPDF) (conversion from html to pdf). A small Python class links them together.
-
Show HN: I am building a new Python library to read/write PDF files
I think you might mean PyMuPDF (https://github.com/pymupdf/PyMuPDF), a Python library built on top of the MuPDF C library (https://mupdf.com/).
PyMuPDF and MuPDF are both available under dual open source AGPL and commercial licenses. They have been around for many years and are under continual development.
[Disclaimer, i work for Artifex, who wrote MuPDF and recently acquired PyMuPDF.]
- M1 Mac: myuPDF install (wheel?)
- legacy install error: PyMuPDF?
What are some alternatives?
LIMoE-pytorch - PyTorch implementation of LIMoE
PyPDF2 - A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
libcolorpicker - Color Picker Library For iOS
ReportLab
typst - A new markup-based typesetting system that is powerful and easy to learn.
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
advanced-brightness-slider-tweak - iOS Tweak that manipulates the brightness slider in the control center so the display brightness and the white point intensity can be modified
borb - borb is a library for reading, creating and manipulating PDF files in python.
NotiBlock - An iOS jailbreak tweak to write custom filters to block notifications
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
LaTeX-OCR - pix2tex: Using a ViT to convert images of equations into LaTeX code.
pdfquery - A fast and friendly PDF scraping library.