Extract Data from PDF

This page summarizes the projects mentioned and recommended in the original post on /r/golang

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • pdfcpu

    A PDF processor written in Go.

  • Try https://github.com/pdfcpu/pdfcpu

  • grumpy

    Grumpy is a Python to Go source code transcompiler and runtime. (by grumpyhome)

  • So if that tool can read it, why not use it for conversion (calling from Go if you prefer)? Or have a look at the source to determine what it does to make the text readable. See also https://github.com/grumpyhome/grumpy

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • qpdf

    QPDF: A content-preserving PDF document transformer

  • UPDATE: We tried repairing the pdf in question and lo and behold, we got a result. As a tool for the repair we used qpdf (https://github.com/qpdf/qpdf/releases), after that the ledongthuc/pdf library had no hassle reading the data.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts