pdfquery
kaitai_struct_formats
pdfquery | kaitai_struct_formats | |
---|---|---|
3 | 3 | |
753 | 682 | |
- | 0.1% | |
0.0 | 6.3 | |
7 months ago | 17 days ago | |
Python | Kaitai Struct | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pdfquery
-
Show HN: I am building a new Python library to read/write PDF files
That makes sense, as "pdfquery" uses pdfminer.six as a dep: https://github.com/jcushman/pdfquery/blob/master/requirement...
-
Heatmap of age group wise daily deaths in Chennai [OC]
Source: Scraped death certificates from GCC website. Scraped the PDFs using pdfquery library for python (shout out to techies Madhan and Atom for helping me get started and fixing code whenever I got stuck.
kaitai_struct_formats
- Magika: AI powered fast and efficient file type identification
-
Fq: Jq for Binary Formats
Kaitai has a repository of binary formats[1] that can be used in visualizers or to auto-generate parsers.
[1] https://formats.kaitai.io/
-
Show HN: I am building a new Python library to read/write PDF files
This is tangential to your submission, but PDF is the file format I use for exercising any library that claims to be a declarative file format (ala https://github.com/kaitai-io/kaitai_struct_formats#readme )
What are some alternatives?
PyMuPDF - PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
cutter - Free and Open Source Reverse Engineering Platform powered by rizin
WeasyPrint - The awesome document factory
jqjq - jq implementation of jq
pdf-issues - Industry-based resolutions for issues and errata reported against any PDF-related specification
i7j-rups - RUPS is an acronym for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText® that allows you to look inside a PDF document and browse the different PDF objects and content streams.
djot - A light markup language
bericht - Incremental HTML to PDF converter.
pdfsyntax - A Python library to inspect and modify the internal structure of a PDF file