Top 3 Python pdf-document Projects
-
parsee-pdf-reader
Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-paragraphs. Full support for scans and images.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
document-barcodes
Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.
Project mention: Parsee.ai – a framework to easily extract complex structured data with LLMs | news.ycombinator.com | 2024-03-31Yes, another LLM framework. This one is specialized on extracting structured data from various document types (mainly PDFs, images and HTML files).
Comes with a new (separate) PDF extraction library that is focused on the extraction of numeric tables (tables with numbers, so especially for the financial domain): https://github.com/parsee-ai/parsee-pdf-reader
Helps to easily set up a dataset to evaluate the performance of various LLMs on data extraction tasks, e.g. extracting revenue figures from financial reports: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...
Index
What are some of the best open-source pdf-document projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | PyPDFForm | 84 |
2 | parsee-pdf-reader | 18 |
3 | document-barcodes | 4 |