Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →
Layout-parser Alternatives
Similar projects and alternatives to layout-parser
-
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
PaddleOCR
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
-
-
simpletransformers
Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
-
tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
mexican-government-report
Text Mining on the 2019 Mexican Government Report, covering from extracting text from a PDF file to plotting the results.
-
document-ai-samples
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
-
wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, etc
-
-
-
-
-
DA-RetinaNet
Official Detectron2 implementation of DA-RetinaNet, An unsupervised domain adaptation scheme for single-stage artwork recognition in cultural sites, Image and Vision Computing (IMAVIS) 2021
-
-
-
shabby-pages
Discontinued ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for use in training models to reverse distortions and recover to original denoised documents.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
layout-parser discussion
layout-parser reviews and mentions
- Ask HN: What are you using to parse PDFs for RAG?
-
Crates for converting PDF's into Markdown
I built my own solution using a combination of Tesseract and OpenCV (in python). But even though the source PDF content is computer generated, I still get sporadic OCR errors. After writing my solution, I came across this https://github.com/Layout-Parser/layout-parser which might be a better starting point for dealing with PDFs but I haven't tried it yet.
-
OCR help required
This sound more like a layout parking issue. Look at Layout Parser, it has helped me on many occasions when I was battling to extract info from PDF documents.
- Amateur programmer here. Will Rust be used in backend for software in the future?
-
Extract text from PDF
One of the tools I'm excited about (but haven't used in production) is LayoutParser. It's open-source, and can do some document image analysis especially on non-generic docs.
-
Document Classification
One project that I saw not to long ago which might be useful is this: https://github.com/Layout-Parser/layout-parser
- A Python Library for Document Layout Understanding
-
A note from our sponsor - Stream
getstream.io | 15 Jul 2025
Stats
Layout-Parser/layout-parser is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of layout-parser is Python.
Popular Comparisons
- layout-parser VS tika-python
- layout-parser VS simpletransformers
- layout-parser VS py-pdf-parser
- layout-parser VS BCNet
- layout-parser VS mexican-government-report
- layout-parser VS ssd_keras
- layout-parser VS DA-RetinaNet
- layout-parser VS EasyOCR
- layout-parser VS shabby-pages
- layout-parser VS pdf-extract