Ask HN: Why is the PDF format so inaccessible?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • PDFKit

    A JavaScript PDF generation library for Node and the browser

  • As for libraries, it seems PDFKit is the dominant one.

    https://github.com/foliojs/pdfkit

    As to why it’s so inaccessible…because Adobe created this monstrosity to do just about everything. Text, fonts, vector graphics, raster graphics, forms, color spaces, JavaScript, encryption, signatures, 3D artwork, video, audio, Flash, and probably more. It’s bonkers as to what it can possibly include, and it was developed during a way different time.

  • OpenPDF

    OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • pandoc

    Universal markup converter

  • resume

    Program that generates my résumé from scratch, in PDF format. (by jchv)

  • I did this once. Maybe my small journal will be useful.

    https://github.com/jchv/resume/blob/master/journal.md

  • TCPDF

    Official clone of PHP library to generate PDF documents and barcodes

  • I had to recreate some PDFs at work that were created by "iText by Lowagie" which must have been a java library at the time.

    I redid it with the FPDF library for php, and it worked out fine. I tried some new features of tcpdf, and it wasn't much work to convert.

    Using inkscape to make an EPS out of an svg was also challenging.

    I know that postscript and PDF is based on a forth stack machine, if I really had to get that low.

    http://fpdf.org/

    https://tcpdf.org/

    https://wiki.c2.com/?ForthPostscriptRelationship

  • sparclur

    PDF Analyzer and Render Comparer

  • It's certainly true that there are a lot of PDF renderers out there with subtle incompatibilities, and a lot of PDF files with subtle nonconformances. However, that doesn't seem like a good reason to not write a new PDF generator!

    Instead, write a conformant one. Better, write one that not only conforms, but also isn't affected by any of the bugs in popular PDF renderers, by testing against all of them. Shawn Davis at LevelUp Research, working on the same DARPA project I'm currently on, has written SPARCLUR https://youtu.be/6I6E1N3CJzQ https://github.com/levelupresearch/sparclur https://pypi.org/project/sparclur/ which will feed your test PDF to Ghostscript, MuPDF, PDFium, PDFMiner, Poppler, QPDF, Xpdf, and some other PDF engines, and compare the results. That way you can see not only if any of them produce errors and warnings, you can also see if they render it differently or extract different text from it. SPARCLUR is Apache-licensed, written in Python, and very well integrated with Jupyter.

    (We've developed some other tools for this as well, but they're not as accessible.)

  • qpdf

    QPDF: A content-preserving PDF document transformer

  • If you're comfortable handling the (typo)graphical aspects of the PDF yourself and have the ability to consume a C++ library, I've had good experiences using the Apache-licensed qpdf[1] library to handle the low-level structural aspects of the PDF standard. It's particularly convenient when your application requires structure-preserving integration of existing PDF content.

    Simple example applications, each completed in 2–3 days, both in C#, using C++/CLI to integrate libqpdf:

    1. Overlaying fixed-format text on pre-existing blank PDF form pages, ensuring the content of each distinct form page is embedded exactly once, and that all necessary assets (fonts, images, etc.) from the blank form PDF pages are included in the output PDF.

    2. Losslessly combining a sequence of PDF, TIFF, and JPEG images into a single PDF with bookmarks pointing to the first page of each source file and existing image compression maintained where possible. In this application, only the source TIFFs were anything other than arbitrary (i.e., the TIFFs were more-or-less baseline images coming from a small number of scanning systems, but the JPEGs and PDFs came from all sorts of different applications).

    [1] https://github.com/qpdf/qpdf

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts