Python PDF

Open-source Python projects categorized as PDF

Top 23 Python PDF Projects

  • OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

    Project mention: Cost aside, what is the most accurate OCR app? | | 2022-12-22

    I don't know what's the best, but I've had good luck with the command line tool ocrmypdf.

  • WeasyPrint

    The awesome document factory

    Project mention: QuestPDF: Modern .NET library for PDF document generation | | 2023-01-18

    The Paged Media spec on counters and counter-resets paints implementations into a corner. They can't both comply with the spec and implement page count resets on page breaks. This has been a known issue with the spec since 2013[1][2] and been a thorn in implementations since.[3]




  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • PyPDF2

    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

    Project mention: How to convert SVGs containing text to a PDF? | | 2023-01-24

    I still haven't needed to do that part in Rust yet, unfortunately. My mother is still using the pypdf-based concatenator I wrote for her years ago.

  • PDFMiner

    Python PDF Parser (Not actively maintained). Check out pdfminer.six.

    Project mention: Creating a python class for organizing courses I took in my education | | 2022-10-15

    Technically this information is on my transcript, so I will be trying to use pdfminer to extract that data if there is a way to use a class you recommend when using that code

  • pdfminer.six

    Community maintained fork of pdfminer - we fathom PDF

    Project mention: Extracting text from PDFs using pdfminer | | 2023-01-23
  • pdfplumber

    Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

    Project mention: Extracting particular things from pdf program? | | 2023-01-21

    To handle machine generated one, a possible package is pdfplumber.

  • borb

    borb is a library for reading, creating and manipulating PDF files in python.

    Project mention: Generating PDF from some sort of template (jinja2) with headers, footers, images, not just a printed HTML document. | | 2022-10-28

    Have you looked at borb? I’m not sure if it’s exactly what you need I found it useful when doing something similar to you

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • pdftabextract

    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

  • malicious-pdf

    💀 Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator or

    Project mention: Malicious PDF Generator: Generador de PDF malicioso para #pentesters y #redteaming ♦️ | | 2022-08-11
  • Papermerge

    Open Source Document Management System for Digital Archives (Scanned Documents)

    Project mention: How to Sort Files with a Search Bar | | 2023-01-23

    Sounds like you want a Document Management System (DMS), something like Alfresco (Bitnami packaged app), PaperMerge, or OpenDocMan

  • pdfarranger

    Small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface.

    Project mention: New Browser-based PDF Editor (github link) | | 2023-01-30
  • Camelot

    A Python library to extract tabular data from PDFs

    Project mention: Camelot: DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead. | | 2022-12-29

    here is the corresponding bug report in git:

  • tabula-py

    Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame

    Project mention: What is the best way to extract tables from scanned pdf's? | | 2022-11-10

    I haven't tried myself but worked okay for some people

  • pikepdf

    A Python library for reading and writing PDF, powered by qpdf

    Project mention: This Week in Python | | 2022-06-17

    pikepdf – A Python library for reading and writing PDF, powered by qpdf

  • pdf2image

    A python module that wraps the pdftoppm utility to convert PDF to PIL Image object

  • CairoSVG

    Convert your vector images

    Project mention: How to dynamically generate graphics and PDFs using Python an jinja | | 2023-01-08

    cairosvg: Provides the SVG-to-PDF Converter

  • fpdf2

    Simple PDF generation for Python

    Project mention: New fpdf2 release - 2.6.1 - PDF encryption - skewing - markdown hyperlinks - Python 3.11 | | 2023-01-13

    Python wrapper for the arXiv API

  • pypandoc

    Thin wrapper for "pandoc" (MIT)

    Project mention: Converting multiple docx to multiple txt filed | | 2022-11-07

    Use Pypandoc

  • rst2pdf

    Use a text editor. Make a PDF.

    Project mention: Rst2pdf: Use a text editor. Make a PDF | | 2022-07-01
  • latex-online

    Online latex compiler. You give it a link, it gives you PDF

    Project mention: Website to compile latex from github repository | | 2023-01-23

    I use Overleaf for everything Latex/Beamer related, and backup my projects onto Github. There used to be a website ( which would generate a static link for you that would compile a PDF from a specified github repository. It was a great way to share developing presentations and reports.

  • Mayan EDMS

    Free Open Source Document Management System (mirror, no pull request or issues)

    Project mention: Hermes, an Open Source Document Management System | | 2023-01-31

    There's also Mayan EDMS [1]. I have no experience with it, but looks sensible from the outside.


  • pdfsyntax

    A Python PDF parsing library and tool built on top to browse the internal structure of a PDF file

    Project mention: Show HN: I am building a new Python library to read/write PDF files | | 2022-11-19
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-01-31.

Python PDF related posts


What are some of the best open-source PDF projects in Python? This list will help you:

Project Stars
1 OCRmyPDF 8,056
2 WeasyPrint 5,442
3 PyPDF2 5,123
4 PDFMiner 4,939
5 pdfminer.six 4,112
6 pdfplumber 3,383
7 borb 2,897
8 pdftabextract 2,037
9 malicious-pdf 1,924
10 Papermerge 1,865
11 pdfarranger 1,838
12 Camelot 1,793
13 tabula-py 1,755
14 pikepdf 1,636
15 pdf2image 1,155
16 CairoSVG 616
17 fpdf2 602
18 602
19 pypandoc 567
20 rst2pdf 490
21 latex-online 456
22 Mayan EDMS 409
23 pdfsyntax 396
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives