Our great sponsors
-
pystitcher
pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
entr
A utility for running arbitrary commands when files change. Uses kqueue(2) or inotify(7) to avoid polling. entr responds to file system events by executing command line arguments or by writing to a FIFO. entr was written to provide to make rapid feedback and automated testing natural and completely ordinary. (by clibs)
I've tested it against a 800 page compilation and didn't face any issues : https://github.com/captn3m0/pdp-book
It's obviously not fast enough for live-recompilations with very large projects, but for smaller projects I've run it against entr[0], and it was pretty good.
[0]: https://github.com/clibs/entr
I've tested it against a 800 page compilation and didn't face any issues : https://github.com/captn3m0/pdp-book
It's obviously not fast enough for live-recompilations with very large projects, but for smaller projects I've run it against entr[0], and it was pretty good.
[0]: https://github.com/clibs/entr
i recently transitioned a large PDF processing pipeline[1] away from PyPDF3 to use the pikepdf[2] library instead. pikepdf is based on the C++ `qpdf` library, and this switch has cut the necessary special-case manual checking + error handling in half compared to pypdf for these sorts of tasks.
[1]: i'm helping out with the CV Open Access archive, https://openaccess.thecvf.com/menu. broadly, our pipeline needs to ingest tons of author-provided PDFs of varying quality and output a canonical PDF with corrected page numbers, proper metadata, and a banner stamped on the first page. it's a lot of work, and it's not uncommon for pypdf to simply fail for slightly invalid input or give garbled results. this isn't optimal since we can't review 2,000 papers per conference release lol. pypdf has been nice, but pikepdf has handled everything i've thrown at it.
[2] https://pikepdf.readthedocs.io
Related posts
- Show HN: Pandoc Markdown CSS Theme
- Question about Yard
- I interviewed Mike Perham (of Sidekiq) on commercializing software, and quitting his job to work on Sidekiq full time. “by March 2014 I was making more money from Sidekiq Pro sales than my Clymb salary”
- Features for HexaPDF table implementation
- HexaPDF to extract text from PDF file