Apache PDFBox
pandoc
Our great sponsors
Apache PDFBox | pandoc | |
---|---|---|
26 | 417 | |
2,357 | 32,051 | |
2.4% | - | |
9.7 | 9.8 | |
4 days ago | 2 days ago | |
Java | Haskell | |
Apache License 2.0 | GNU General Public License v2.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache PDFBox
-
PDF rendering server-side using HTML 5 + CSS 3
Are you looking for a way to render PDF's or produce them? If you want to produce PDF's, I've used https://pdfbox.apache.org/ successfully as well as https://itextpdf.com/ (potentially costs money).
-
So you want to modify the text of a PDF by hand
If you don't mind using java, you can use the open source Apache PDFBox library
It's relatively performant and it's a mature and supported codebase that can accomplish most pdf tasks.
- best pdf library to use in 2023?
-
How to crop, split, remove pages from PDFs with Java and PDFBox
Then, open the pdf_utils/pom.xml file and add a dependency to PDFBox, in the dependencies section:
- Does no one use PDF files anymore?? In need of a PDF generator package...
- Thoughts on Birt Report for pdf reports
-
How I archived 100 million PDF documents... (Part 1)
So, when I started to view the documents, a lot of them simply failed to open. I had to look around for a library that could verify PDF documents. I had some experience with PDFBox in the past, so it seemed to be a good go-to solution. It had no way to verify documents by default, but it could open and parse them and that was enough to filter out the incorrect ones. It felt a little bit strange just to read the whole PDF into the memory to verify if it is correct or not, but hey I needed a simple fix for now and it worked really well.
- Best FOSS (ideally Docker) that can split PDF files ?
-
PDF processing and analysis with open-source tools
PDFBox can do this. It’s not part of the CLI but it wouldn’t be too hard to add:
https://github.com/apache/pdfbox/blob/5b00807463279f1002e245...
-
I am looking to automate a process at work...
You'll find libraries in most languages for parsing content out of PDF files, I did this most recently at work in Java using PDFBox.
pandoc
-
📓 Versionner et builder l'eBook de son Entretien Annuel d'Evaluation sur Git(Hub)
pandoc toolchain pour builder une version confortable/imprimable en phase de travail (ePub, pdf, docx, html)
-
Launch HN: Onedoc (YC W24) – A better way to create PDFs
Congrats on the launch, I guess, but there are so many free options that I can't think of a situation where paying $0.25 per document would be justified...? Just to name a few:
Back in the days, I used to use XSL-FO [0] and it was okay. It was not very precise but it rarely if ever broke, and was perfectly integrated with an XML/XSLT solution. Yeah, this was a long time ago.
Last month I used html-to-pdfmake [1] and it's also not very precise and more fragile, but very efficient and fast.
Yet another approach would be to pro grammatically generate .rtf files (for example) and use Pandoc [2] to produce PDFs (I have not tried this in production but don't see why it wouldn't work).
[0] https://en.wikipedia.org/wiki/XSL_Formatting_Objects
-
Ask HN: Looking for lightweight personal blogging platform
Others have mentioned static site generators. I like Hakyll [1] because it can tightly integrate with Pandoc [2] and allows you to develop custom solutions if your needs ever grow.
[1]: https://jaspervdj.be/hakyll/
[2]: https://pandoc.org/
-
Show HN: CLI for generating beautiful PDF for offline reading
Have you compared it with a conversion by pandoc (https://pandoc.org/)?
-
Pandoc
I have used it to kickstart a blogging project that I wish to come back to soon. The Lua inter-op for custom readers, writers and filters is great but I wish there was more editor integration and even perhaps an official IDE/editor with built-in debugging features (probably something already do-able with Emacs but I haven't checked). The only blocker for my project is no support for "ChunkedDoc" for Lua filters [1] which forces me to write more code and a complicated Makefile.
- I don't always use LaTeX, but when I do, I compile to HTML (2013)
-
Running Quarto Markdown in Docker
Until recently, I'd been using pandoc but, having taken the time to look around Quarto, it's a hell of a lot more powerful.
- ArXiv now offers papers in HTML format
-
A doctoral dissertation build system
On the technically advanced end of the spectrum you'll find John MacFarlane [1], professor of philosophy at Berkeley and creator of pandoc [2]. Some people are just amazing.
What are some alternatives?
iText - [DEPRECATED] Core Java Library + PDF/A, xtra and XML Worker. Only security fixes will be added — please use iText 7
OpenPDF - OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.
Apache FOP - Apache XML Graphics FOP
flyingsaucer - XML/XHTML and CSS 2.1 renderer in pure Java
Apache POI - Mirror of Apache POI
pandoc-highlighting-extensions - Extensions to Pandoc syntax highlighting
Dynamic Jasper - Dynamic Reports using Jasper Reports
obsidian-html - :file_cabinet: A simple tool to convert an Obsidian vault into a static directory of HTML files.
Open HTML to PDF - An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!
obsidian-export - Rust library and CLI to export an Obsidian vault to regular Markdown
boxable - Boxable is a library that can be used to easily create tables in pdf documents.
Obsidian-MD-To-PDF - A command line python script to convert Obsidian md files to a pdf