pdfsyntax
pdfquery
pdfsyntax | pdfquery | |
---|---|---|
8 | 3 | |
420 | 753 | |
- | - | |
8.5 | 0.0 | |
8 days ago | 7 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pdfsyntax
- Show HN: PDFSyntax, a Python library to inspect and transform PDF files
-
Show HN: I am building a new Python library to read/write PDF files
I never knew about the J number suffix in python: https://docs.python.org/3/reference/lexical_analysis.html#im... which it would appear is used to represent references: https://github.com/desgeeko/pdfsyntax/blob/main/tests/test_p...
I wish you good luck, this file format has tripped up many, many a developer. It blew up on a pdf I had lying around:
ValueError: could not convert string to float: b'5.0.0'
-
This Week In Python
pdfsyntax – A Python PDF parsing library browse the internal structure of a PDF file
- Show HN: Browse the internal structure of a PDF file
pdfquery
-
Show HN: I am building a new Python library to read/write PDF files
That makes sense, as "pdfquery" uses pdfminer.six as a dep: https://github.com/jcushman/pdfquery/blob/master/requirement...
-
Heatmap of age group wise daily deaths in Chennai [OC]
Source: Scraped death certificates from GCC website. Scraped the PDFs using pdfquery library for python (shout out to techies Madhan and Atom for helping me get started and fixing code whenever I got stuck.
What are some alternatives?
PyMuPDF - PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
sunfish - Sunfish: a Python Chess Engine in 111 lines of code
WeasyPrint - The awesome document factory
djot - A light markup language
pdf-issues - Industry-based resolutions for issues and errata reported against any PDF-related specification
mupdf - mirrored from git://git.ghostscript.com/mupdf.git
i7j-rups - RUPS is an acronym for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText® that allows you to look inside a PDF document and browse the different PDF objects and content streams.
kaitai_struct_formats - Kaitai Struct: library of binary file formats (.ksy)