py-pdf-parser
A Python tool to help extracting information from structured PDFs. (by jstockwin)
layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis (by Layout-Parser)
py-pdf-parser | layout-parser | |
---|---|---|
2 | 6 | |
337 | 4,476 | |
- | 1.6% | |
4.4 | 0.0 | |
3 days ago | about 2 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
py-pdf-parser
Posts with mentions or reviews of py-pdf-parser.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-11-02.
-
Need free/low-cost software that allows me to view the tags in a PDF.
Maybe look at this?
-
Extract text from PDF
I'd recommend trying py-pdf-parser [0] - it allows you to fetch data from documents based on text "markers". E.g. you can easily find data, located to the right of "EMAL FROM:" text [0] - https://github.com/jstockwin/py-pdf-parser
layout-parser
Posts with mentions or reviews of layout-parser.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-01-06.
-
Crates for converting PDF's into Markdown
I built my own solution using a combination of Tesseract and OpenCV (in python). But even though the source PDF content is computer generated, I still get sporadic OCR errors. After writing my solution, I came across this https://github.com/Layout-Parser/layout-parser which might be a better starting point for dealing with PDFs but I haven't tried it yet.
-
OCR help required
This sound more like a layout parking issue. Look at Layout Parser, it has helped me on many occasions when I was battling to extract info from PDF documents.
- Amateur programmer here. Will Rust be used in backend for software in the future?
-
Extract text from PDF
One of the tools I'm excited about (but haven't used in production) is LayoutParser. It's open-source, and can do some document image analysis especially on non-generic docs.
-
Document Classification
One project that I saw not to long ago which might be useful is this: https://github.com/Layout-Parser/layout-parser
- A Python Library for Document Layout Understanding
What are some alternatives?
When comparing py-pdf-parser and layout-parser you can also consider the following projects:
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.