Show HN: Pystitcher – A Declarative Alternative to Pdftk

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • pystitcher

    pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

  • pdp-book

  • I've tested it against a 800 page compilation and didn't face any issues : https://github.com/captn3m0/pdp-book

    It's obviously not fast enough for live-recompilations with very large projects, but for smaller projects I've run it against entr[0], and it was pretty good.

    [0]: https://github.com/clibs/entr

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • entr

    A utility for running arbitrary commands when files change. Uses kqueue(2) or inotify(7) to avoid polling. entr responds to file system events by executing command line arguments or by writing to a FIFO. entr was written to provide to make rapid feedback and automated testing natural and completely ordinary. (by clibs)

  • I've tested it against a 800 page compilation and didn't face any issues : https://github.com/captn3m0/pdp-book

    It's obviously not fast enough for live-recompilations with very large projects, but for smaller projects I've run it against entr[0], and it was pretty good.

    [0]: https://github.com/clibs/entr

  • pikepdf

    A Python library for reading and writing PDF, powered by QPDF

  • i recently transitioned a large PDF processing pipeline[1] away from PyPDF3 to use the pikepdf[2] library instead. pikepdf is based on the C++ `qpdf` library, and this switch has cut the necessary special-case manual checking + error handling in half compared to pypdf for these sorts of tasks.

    [1]: i'm helping out with the CV Open Access archive, https://openaccess.thecvf.com/menu. broadly, our pipeline needs to ingest tons of author-provided PDFs of varying quality and output a canonical PDF with corrected page numbers, proper metadata, and a banner stamped on the first page. it's a lot of work, and it's not uncommon for pypdf to simply fail for slightly invalid input or give garbled results. this isn't optimal since we can't review 2,000 papers per conference release lol. pypdf has been nice, but pikepdf has handled everything i've thrown at it.

    [2] https://pikepdf.readthedocs.io

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts