layout-parser vs pdf-to-csv-table-extactor

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

layout-parser		pdf-to-csv-table-extactor
	Project
6	Mentions	1
4,453	Stars	108
3.3%	Growth	-
0.0	Activity	10.0
about 2 months ago	Latest Commit	over 5 years ago
Python	Language	Python
Apache License 2.0	License	Do What The F*ck You Want To Public License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

layout-parser

Posts with mentions or reviews of layout-parser. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-06.

Crates for converting PDF's into Markdown
2 projects | /r/rust | 6 Jan 2023

I built my own solution using a combination of Tesseract and OpenCV (in python). But even though the source PDF content is computer generated, I still get sporadic OCR errors. After writing my solution, I came across this https://github.com/Layout-Parser/layout-parser which might be a better starting point for dealing with PDFs but I haven't tried it yet.
OCR help required
1 project | /r/Python | 18 Oct 2022

This sound more like a layout parking issue. Look at Layout Parser, it has helped me on many occasions when I was battling to extract info from PDF documents.
Amateur programmer here. Will Rust be used in backend for software in the future?
2 projects | /r/rust | 27 May 2022
Extract text from PDF
7 projects | /r/Python | 2 Nov 2021

One of the tools I'm excited about (but haven't used in production) is LayoutParser. It's open-source, and can do some document image analysis especially on non-generic docs.
Document Classification
2 projects | /r/computervision | 8 Jun 2021

One project that I saw not to long ago which might be useful is this: https://github.com/Layout-Parser/layout-parser
A Python Library for Document Layout Understanding
1 project | news.ycombinator.com | 8 Apr 2021

pdf-to-csv-table-extactor

Posts with mentions or reviews of pdf-to-csv-table-extactor. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-05-27.

Amateur programmer here. Will Rust be used in backend for software in the future?
2 projects | /r/rust | 27 May 2022

And there's one or two examples around that may work: https://github.com/vitali84/pdf-to-csv-table-extactor

What are some alternatives?

When comparing layout-parser and pdf-to-csv-table-extactor you can also consider the following projects:

EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

py-pdf-parser - A Python tool to help extracting information from structured PDFs.

tika-python - Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

BCNet - Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]

ssd_keras - A Keras port of Single Shot MultiBox Detector

simpletransformers - Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI

shabby-pages - ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for use in training models to reverse distortions and recover to original denoised documents.

pdf-extract - A rust library for extracting content from pdfs

GDR-Net - GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

DA-RetinaNet - Official Detectron2 implementation of DA-RetinaNet of our Image and Vision Computing 2021 work 'An unsupervised domain adaptation scheme for single-stage artwork recognition in cultural sites'

mexican-government-report - Text Mining on the 2019 Mexican Government Report, covering from extracting text from a PDF file to plotting the results.

FastestDet - :zap: A newly designed ultra lightweight anchor free target detection algorithm， weight only 250K parameters， reduces the time consumption by 10% compared with yolo-fastest, and the post-processing is simpler