CascadeTabNet
deepdoctection
CascadeTabNet | deepdoctection | |
---|---|---|
1 | 8 | |
1,397 | 2,209 | |
- | 6.2% | |
0.0 | 9.2 | |
over 2 years ago | 2 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
CascadeTabNet
-
[D] Getting super-level table extraction
Recently, I've been researching extracting tables from image documents. First I tried with pdfs, however, the data extraction libraries like camelot are inconsistent. I found a deep learning model called CascadeTabNet. The detection results are okay but cell recognition is poor. I even found Multi-Type-TD-TSR for table extraction. It uses image processing techniques to find the grids. It performs well on structured and bordered tables. However, it messes up if the cell is not properly aligned. Even if extraction is successful, aggregation of multi-line cells, i.e post-processing, is not very obvious.
deepdoctection
-
Show HN: Beyond text splitting – improved file parsing for LLM's
https://github.com/deepdoctection/deepdoctection
Have you tried this ?
-
April 2023
DeepDoctection: Document extraction and analysis using deep learning models (https://github.com/deepdoctection/deepdoctection)
- DeepDoctection: Document extraction and analysis using deep learning models
- DeepDoctection
-
[D] Can I use ML/AI to read the back panels of electronic components?
deepdoctection/deepdoctection: A Repo For Document AI
What are some alternatives?
table-transformer - Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
DocumentInformationExtraction - Key Information Extraction From Documents: Evaluation And Generator
donut - Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Flowise - Drag & drop UI to build your customized LLM flow
Multi-Type-TD-TSR - Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
PentestGPT - A GPT-empowered penetration testing tool
bark - 🔊 Text-Prompted Generative Audio Model
Information-extraction-from-document - Graph Key Information Extraction: GKIE
JARVIS - JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
loopgpt - Modular Auto-GPT Framework