Show HN: Pystitcher – A Declarative Alternative to Pdftk

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pystitcher

3 388 0.0 Python

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input
pdp-book

1 2 0.0 HTML

I've tested it against a 800 page compilation and didn't face any issues : https://github.com/captn3m0/pdp-book
It's obviously not fast enough for live-recompilations with very large projects, but for smaller projects I've run it against entr[0], and it was pretty good.
[0]: https://github.com/clibs/entr

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
entr

5 591 5.6 C

A utility for running arbitrary commands when files change. Uses kqueue(2) or inotify(7) to avoid polling. entr responds to file system events by executing command line arguments or by writing to a FIFO. entr was written to provide to make rapid feedback and automated testing natural and completely ordinary. (by clibs)

I've tested it against a 800 page compilation and didn't face any issues : https://github.com/captn3m0/pdp-book
It's obviously not fast enough for live-recompilations with very large projects, but for smaller projects I've run it against entr[0], and it was pretty good.
[0]: https://github.com/clibs/entr

pikepdf

4 2,019 9.5 Python

A Python library for reading and writing PDF, powered by QPDF

i recently transitioned a large PDF processing pipeline[1] away from PyPDF3 to use the pikepdf[2] library instead. pikepdf is based on the C++ `qpdf` library, and this switch has cut the necessary special-case manual checking + error handling in half compared to pypdf for these sorts of tasks.
[1]: i'm helping out with the CV Open Access archive, https://openaccess.thecvf.com/menu. broadly, our pipeline needs to ingest tons of author-provided PDFs of varying quality and output a canonical PDF with corrected page numbers, proper metadata, and a banner stamped on the first page. it's a lot of work, and it's not uncommon for pypdf to simply fail for slightly invalid input or give garbled results. this isn't optimal since we can't review 2,000 papers per conference release lol. pypdf has been nice, but pikepdf has handled everything i've thrown at it.
[2] https://pikepdf.readthedocs.io

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project