Our great sponsors
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
yaralyzer
Visually inspect and force decode YARA and regex matches found in both binary and text data. With Colors.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.
The Pdfalyzer is a command line tool (paralyze) as well as a library for working with, visualizing, and scanning the contents of a PDF. Motivation for the project was personal: I got hacked by a PDF that turned out to be hiding its maleficent instructions inside the font binary where it was missed by modern malware scanners (twitter thread) (more details)
This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.
This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.
This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.
This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.
for the ultra low level the Didier Stevens tools mentioned in the OP are rock solid, but for anything sort of in the middle zone - allowing you to work with the logical structure, having a consistent API, etc. etc. - yeah there's not much out there, which is why I ended up making The Pdfalyzer (and The Yaralyzer, which was basically just a side effect).