Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more! Learn more →
Top 4 Python pdf-parsing Projects
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Project mention: PDF Extraction: Retrieving Text and Tables together using Python🐍 | dev.to | 2024-09-22Extracting both text and tables can be challenging when working with PDF files due to their complex structure. However, the “pdfplumber” library offers a powerful solution. This article explores an effective method for combining text and table extraction from PDFs using pdfplumber. Special thanks to Karl Genockey a.k.a. cmdlineuser and other contributors for their brilliant approach discussed here.
-
-
pdf-to-markdown
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
Python pdf-parsing discussion
Python pdf-parsing related posts
-
PDF Extraction: Retrieving Text and Tables together using Python🐍
-
Running OCR against PDFs and images directly in the browser
-
Parsing dates with PDFminer
-
How to Extract Data from Tables in a Public Record PDF
-
Code to extract text from pdf to excel
-
I need to parse unstructured tables from a pdf into a json, what can I do
-
Advanced PDF to Excel with documents and example code
-
A note from our sponsor - Sevalla
sevalla.com | 1 Sep 2025
Index
What are some of the best open-source pdf-parsing projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | PyPDF2 | 9,363 |
2 | pdfplumber | 8,209 |
3 | py-pdf-parser | 412 |
4 | pdf-to-markdown | 92 |