kaitai_struct_formats
i7j-rups
Our great sponsors
kaitai_struct_formats | i7j-rups | |
---|---|---|
3 | 3 | |
682 | 248 | |
0.7% | 1.6% | |
6.3 | 5.3 | |
13 days ago | 5 days ago | |
Kaitai Struct | Java | |
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
kaitai_struct_formats
- Magika: AI powered fast and efficient file type identification
-
Fq: Jq for Binary Formats
Kaitai has a repository of binary formats[1] that can be used in visualizers or to auto-generate parsers.
[1] https://formats.kaitai.io/
-
Show HN: I am building a new Python library to read/write PDF files
This is tangential to your submission, but PDF is the file format I use for exercising any library that claims to be a declarative file format (ala https://github.com/kaitai-io/kaitai_struct_formats#readme )
i7j-rups
-
So you want to modify the text of a PDF by hand
Great post. I've spend a lot of time reading through the PDF specification over the last ~5 years while building DocSpring [1], and I still feel like I've barely scratched the surface. qpdf is a great tool. One of my other favorites is RUPS [2], which really lets you dig into the structure of a PDF.
[1] https://docspring.com
[2] https://github.com/itext/i7j-rups
-
Show HN: I am building a new Python library to read/write PDF files
> find a version of iText RUPS application from somewhere on the internet
You mean this, right? https://github.com/itext/i7j-rups#readme
-
Any decent free online tool which can give me a breakdown of pdf contents including relative sizes of assets such as images, fonts, etc?
It's not an online tool, but it's free nonetheless: https://github.com/itext/i7j-rups
What are some alternatives?
PyMuPDF - PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
pdfquery - A fast and friendly PDF scraping library.
pdfsyntax - A Python library to inspect and modify the internal structure of a PDF file
cutter - Free and Open Source Reverse Engineering Platform powered by rizin
djot - A light markup language
jqjq - jq implementation of jq
annotated-pdf-spec - Collection of useful hints for implementing a PDF library
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
bericht - Incremental HTML to PDF converter.
polyfile - A pure Python cleanroom implementation of libmagic, with instrumented parsing from Kaitai struct and an interactive hex viewer