WeasyPrint
PyPDF2
Our great sponsors
WeasyPrint | PyPDF2 | |
---|---|---|
8 | 15 | |
4,961 | 4,327 | |
1.9% | 5.8% | |
9.7 | 0.0 | |
6 days ago | 4 days ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
WeasyPrint
-
QuestPDF 2021.10 - a new version of the open-source, MIT-licensed, C# library for generating PDF documents with fluent API, now with extended text capabilities. Please help me make it popular :)
I’d recommend Weasyprint (.net core wrapper) instead of wkhtmltopdf. It supports CSS Paged Media which is pretty much required for everything but the simplest of HTML2PDF conversions.
-
Is there a way to publish a PDF report in a Jenkins job?
you can convert html to pdf using https://github.com/Kozea/WeasyPrint
-
Beautiful PDFs from HTML
Yeah, in the Python world there's WeasyPrint for PDF out in the wild as well. It's quite slick, but it's a harder sell because of Python, which corporate types seem to think is bad hacker central.
- WeasyPrint - The awesome document factory
- WeasyPrint – Convert web documents to PDF
-
wkhtmltopdf - Convert HTML to PDF
Another free CLI tool to consider is WeasyPrint. (Github)
PyPDF2
- How do I re-arrange a pdf/docx?
-
This Week in Python
PyPDF2 – A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
-
Program that takes text from pdf
I prefer PyPDF2, but it hasn't been maintained for several years. One of these days I'll overcome inertia and convert projects to use PyMuPDF instead, as someone else mentioned.
-
How do I get python to read .pdf's in a directory?
I've had some success reading/writing PDFs using this library. Not sure if it will help you, but it's worth a shot.
-
Need help with indexing pdf tables in python
Maybe PyPDF2 https://pythonhosted.org/PyPDF2/
-
Could you give examples of types of NLP projects you worked on at work in real business scenarios?
Many of our PDFs were buggy, and many python libraries to work with PDFs are buggy. The library PyPDF2 works well on many PDFs, but hangs in this loop on others. Attempts to use most python PDF libraries resulted in endless log files of:
-
Bring back account statement csv, please.
I write my code in Python and found the pdf reader routines (https://pythonhosted.org/PyPDF2/). With it, I am able to convert my pdf statement into a huge string that can be parsed. Not there yet, maybe 85%, but it looks like I can make an exact replacement of the csv. By the time routine is polished up and tested with multiple statements, it'll probably be a solid week of effort.
-
Find titles in PDFs without opening
Great, you should be able to use something like PyPDF2 to open the file, extract the text from the front page, and use that as a variable to rename the file. It may be a bit tricky to actually extract the desired title.
-
Read text from pdf-file with textract?
What text are you expecting from that pdf? AFAIK, textract just wraps parsers for various file types. Can you try something like PyPDF2 and see if you get the same/similar result?
-
Comparing PDF pages
Are they scanned or computer-prepared? If they're computer-prepared, you could use PyPDF2, and compare the contents of each page to all the others. Open the file, open each page and compare it to all the pages that follow it using things like extractText() and getContents().
What are some alternatives?
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
ReportLab
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Camelot - A Python library to extract tabular data from PDFs
WKHTMLToPDF - Convert HTML to PDF using Webkit (QtWebKit)
textract - extract text from any document. no muss. no fuss.
borb - borb is a library for reading, creating and manipulating PDF files in python.
pdftabextract - A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
pymorphy2 - Morphological analyzer / inflection engine for Russian and Ukrainian languages.
pdfminer.six - Community maintained fork of pdfminer - we fathom PDF