InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Yolo-doclaynet Alternatives
Similar projects and alternatives to yolo-doclaynet
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
-
-
-
vehicle_detection_tracker
🚗 VehicleDetectionTracker: Real-time vehicle detection and tracking powered by YOLO. 🚙🚕 A personal Proof of Concept (POC) aimed at exploring the capabilities of real-time vehicle tracking, precision, and adaptability in computer vision projects. This is a testbed for learning and experimentation with YOLO and vehicle detection techniques.
-
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
AlcheMark
Your files ready for Gen AI ✨🚀 AlcheMark is a lightweight PDF to Markdown, alchemical-inspired toolkit that transmutes PDF documents into structured Markdown pages—complete with rich metadata and named‐entity annotations—empowering you to uncover insights page by page.
-
-
AS-One
Easy & Modular Computer Vision Detectors, Trackers & SAM - Run YOLOv9,v8,v7,v6,v5,R,X in under 10 lines of code.
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
yolo-doclaynet discussion
yolo-doclaynet reviews and mentions
-
PDF to Text, a Challenging Problem
I've worked on this in my day job: extracting _all_ relevant information from a financial services PDF for a bert based search engine.
The only way to solve that is with a segmentation model followed by a regular OCR model. VLM aren't ready for prime time and won't be for a decade on more.
What worked was just using doclaynet trained YOLO models: https://github.com/DS4SD/DocLayNet if you don't care about images or tables you can feed the results into tesseract (but for the love of god read the manual). Congratulations, you're done.
Here's some pre-trained models that work OK out of the box: https://github.com/ppaanngggg/yolo-doclaynet I found that we needed to increase the resolution from ~700px to ~2100px horizontal for financial data segmentation.
VLMs on the other hand still choke on long text and hallucinate unpredictably. Worse they can't understand nested data. If you give _any_ current model nothing harder than three nested rectangles with text under each they will not extract the text correctly. Given that nested rectangles describes every table no VLM can currently extract data from anything but the most straightforward of tables.
-
YOLOv12: The Next Evolution in Document Layout Analysis
The project uses my codebase yolo-doclaynet. You can find all free models on huggingface, while the largest model is available here (trained using rented GPU resources).
- YOLO models trained on DocLayNet, support document analytic intelligency
-
How to analyze document layout by YOLO
You can find my solution in yolo-doclaynet. After examining several models and datasets, I've chosen YOLO as the base model and DocLayNet as the training data. Let's delve into more details.
-
A note from our sponsor - InfluxDB
www.influxdata.com | 21 May 2025
Stats
ppaanngggg/yolo-doclaynet is an open source project licensed under GNU Affero General Public License v3.0 which is an OSI approved license.
The primary programming language of yolo-doclaynet is Python.