Help Extracting Data from PDF Files

This page summarizes the projects mentioned and recommended in the original post on /r/techsupport

Our great sponsors
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • Sonar - Write Clean Java Code. Always.
  • Mergify - Updating dependencies is time-consuming.
  • Apache PDFBox

    Mirror of Apache PDFBox

    If you can program in Java then Apache PDFBox is an excellent very high quality library for reading (and writing) PDFs.

  • mupdf

    mirrored from git://git.ghostscript.com/mupdf.git (by ccxvii)

    Another possibility is MuPDF. There are bindings for many platforms and languages.

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts