SaaSHub helps you find the best software and product alternatives Learn more →
Top 6 Python table-extraction Projects
-
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Project mention: Running OCR against PDFs and images directly in the browser | news.ycombinator.com | 2024-03-30 -
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
table-transformer
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Saw this last time but never played with it https://github.com/microsoft/table-transformer
-
img2table
img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
-
-
parsee-pdf-reader
Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-paragraphs. Full support for scans and images.
Project mention: Parsee.ai – a framework to easily extract complex structured data with LLMs | news.ycombinator.com | 2024-03-31Yes, another LLM framework. This one is specialized on extracting structured data from various document types (mainly PDFs, images and HTML files).
Comes with a new (separate) PDF extraction library that is focused on the extraction of numeric tables (tables with numbers, so especially for the financial domain): https://github.com/parsee-ai/parsee-pdf-reader
Helps to easily set up a dataset to evaluate the performance of various LLMs on data extraction tasks, e.g. extracting revenue figures from financial reports: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...
Python table-extraction related posts
- Data extraction from pdf
- Parsing dates with PDFminer
- [P] OCR + Table Extraction Advice
- How to Extract Data from Tables in a Public Record PDF
- How do you parse tables in PDF with langchain? Especially, the context which is few lines above and below the table.
- [D] Unimpressive improvement in training speed after upgrading from GTX 980 Ti to RTX 4090
- Code to extract text from pdf to excel
-
A note from our sponsor - SaaSHub
www.saashub.com | 17 Apr 2024
Index
What are some of the best open-source table-extraction projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pdfplumber | 5,468 |
2 | PyMuPDF | 3,969 |
3 | table-transformer | 1,758 |
4 | img2table | 366 |
5 | ExtractTable-py | 236 |
6 | parsee-pdf-reader | 18 |