The Pdfalyzer is a tool for visualizing the inner tree structure of a PDF in large and colorful diagrams as well as scanning its internals for suspicious content

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

PyPDF2

30 7,396 9.5 Python

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.

pdfalyzer

8 220 8.3 Python

Analyze PDFs. With colors. And Yara.

The Pdfalyzer is a command line tool (paralyze) as well as a library for working with, visualizing, and scanning the contents of a PDF. Motivation for the project was personal: I got hacked by a PDF that turned out to be hiding its maleficent instructions inside the font binary where it was missed by modern malware scanners (twitter thread) (more details)

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
DidierStevensSuite

7 1,830 5.6 Python

Please no pull requests for this repository. Thanks!

This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.

peepdf

5 1,195 0.0 Python

Powerful Python tool to analyze PDF documents

This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.

anytree

2 900 7.7 Python

Python tree data library

This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.

rich

148 47,088 8.0 Python

Rich is a Python library for rich text and beautiful formatting in the terminal.

This tool was built to fill a gap in the PDF assessment landscape. Didier Stevens's pdfid.py and pdf-parser.py are still the best game in town when it comes to PDF analysis tools but they lack in the visualization department and also don't give you much to work with as far as giving you a data model you can write your own code around. Peepdf seemed promising but turned out to be in a buggy, out of date, and more or less unfixable state. And neither of them offered much in the way of tooling for embedded binary analysis. Thus I felt the world might be slightly improved if I strung together a couple of more stable/well known/actively maintained open source projects (AnyTree, PyPDF2, and Rich) into this tool.

yaralyzer

4 100 5.1 Python

Visually inspect and force decode YARA and regex matches found in both binary and text data. With Colors.

for the ultra low level the Didier Stevens tools mentioned in the OP are rock solid, but for anything sort of in the middle zone - allowing you to work with the logical structure, having a consistent API, etc. etc. - yeah there's not much out there, which is why I ended up making The Pdfalyzer (and The Yaralyzer, which was basically just a side effect).

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project