SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python PDF Projects
-
I don't know what's the best, but I've had good luck with the command line tool ocrmypdf. https://github.com/ocrmypdf/OCRmyPDF
-
Project mention: QuestPDF: Modern .NET library for PDF document generation | news.ycombinator.com | 2023-01-18
The Paged Media spec on counters and counter-resets paints implementations into a corner. They can't both comply with the spec and implement page count resets on page breaks. This has been a known issue with the spec since 2013[1][2] and been a thorn in implementations since.[3]
1: https://www.w3.org/Style/CSS/Tracker/issues/334
2: https://github.com/w3c/csswg-drafts/issues/4760
3: https://github.com/Kozea/WeasyPrint/issues/93#issuecomment-4...
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
I still haven't needed to do that part in Rust yet, unfortunately. My mother is still using the pypdf-based concatenator I wrote for her years ago.
-
Project mention: Creating a python class for organizing courses I took in my education | reddit.com/r/learnpython | 2022-10-15
Technically this information is on my transcript, so I will be trying to use pdfminer to extract that data if there is a way to use a class you recommend when using that code https://github.com/pdfminer/pdfminer.six
-
-
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Project mention: Extracting particular things from pdf program? | reddit.com/r/learnpython | 2023-01-21To handle machine generated one, a possible package is pdfplumber.
-
Project mention: Generating PDF from some sort of template (jinja2) with headers, footers, images, not just a printed HTML document. | reddit.com/r/learnpython | 2022-10-28
Have you looked at borb? I’m not sure if it’s exactly what you need I found it useful when doing something similar to you
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
malicious-pdf
💀 Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh
Project mention: Malicious PDF Generator: Generador de PDF malicioso para #pentesters y #redteaming ♦️ | reddit.com/r/u_esgeeks | 2022-08-11 -
Sounds like you want a Document Management System (DMS), something like Alfresco (Bitnami packaged app), PaperMerge, or OpenDocMan
-
pdfarranger
Small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface.
-
Project mention: Camelot: DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead. | reddit.com/r/learnpython | 2022-12-29
here is the corresponding bug report in git: https://github.com/camelot-dev/camelot/issues/339
-
Project mention: What is the best way to extract tables from scanned pdf's? | reddit.com/r/learnpython | 2022-11-10
I haven't tried myself but https://github.com/chezou/tabula-py worked okay for some people
-
pikepdf – A Python library for reading and writing PDF, powered by qpdf
-
-
Project mention: How to dynamically generate graphics and PDFs using Python an jinja | dev.to | 2023-01-08
cairosvg: Provides the SVG-to-PDF Converter
-
Project mention: New fpdf2 release - 2.6.1 - PDF encryption - skewing - markdown hyperlinks - Python 3.11 | reddit.com/r/pythonnews | 2023-01-13
-
-
Project mention: Converting multiple docx to multiple txt filed | reddit.com/r/learnpython | 2022-11-07
Use Pypandoc
-
-
I use Overleaf for everything Latex/Beamer related, and backup my projects onto Github. There used to be a website (https://latexonline.cc/) which would generate a static link for you that would compile a PDF from a specified github repository. It was a great way to share developing presentations and reports.
-
Project mention: Hermes, an Open Source Document Management System | news.ycombinator.com | 2023-01-31
There's also Mayan EDMS [1]. I have no experience with it, but looks sensible from the outside.
-
pdfsyntax
A Python PDF parsing library and tool built on top to browse the internal structure of a PDF file
Project mention: Show HN: I am building a new Python library to read/write PDF files | reddit.com/r/patient_hackernews | 2022-11-19 -
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python PDF related posts
- Need free/low-cost software that allows me to view the tags in a PDF.
- New Browser-based PDF Editor (github link)
- Breaking up oversized pdfs into multiple sheets.
- How to convert SVGs containing text to a PDF?
- Website to compile latex from github repository
- Extracting text from PDFs using pdfminer
- Looking for free software to only extract pages from PDFs
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007fea6010dbf8>
www.saashub.com | 1 Feb 2023
Index
What are some of the best open-source PDF projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | OCRmyPDF | 8,056 |
2 | WeasyPrint | 5,442 |
3 | PyPDF2 | 5,123 |
4 | PDFMiner | 4,939 |
5 | pdfminer.six | 4,112 |
6 | pdfplumber | 3,383 |
7 | borb | 2,897 |
8 | pdftabextract | 2,037 |
9 | malicious-pdf | 1,924 |
10 | Papermerge | 1,865 |
11 | pdfarranger | 1,838 |
12 | Camelot | 1,793 |
13 | tabula-py | 1,755 |
14 | pikepdf | 1,636 |
15 | pdf2image | 1,155 |
16 | CairoSVG | 616 |
17 | fpdf2 | 602 |
18 | arxiv.py | 602 |
19 | pypandoc | 567 |
20 | rst2pdf | 490 |
21 | latex-online | 456 |
22 | Mayan EDMS | 409 |
23 | pdfsyntax | 396 |