Our great sponsors
-
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I recently published a story that was based on some data analysis I did of a report I obtained from the Department of Behavioral Health and Developmental Services in VA. I wanted to share a quick walkthrough of how I extracted the data from tables in a PDF using a Python module called PDFplumber. I also uploaded a video to Youtube if you prefer that.
That's it! I wrote some additional code to pull the values from the table and clean the data before creating the final visualization.
Related posts
- Parsing dates with PDFminer
- Code to extract text from pdf to excel
- I need to parse unstructured tables from a pdf into a json, what can I do
- Advanced PDF to Excel with documents and example code
- how do I automate extracting data from two pdfs and input into an excel sheet according to an order number