Extract Data from PDF

This page summarizes the projects mentioned and recommended in the original post on /r/golang

Our great sponsors
  • WorkOS - The modern API for authentication & user identity.
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • Onboard AI - ChatGPT with full context of any GitHub repo.
  • pdfcpu

    A PDF processor written in Go.

    Try https://github.com/pdfcpu/pdfcpu

  • grumpy

    Grumpy is a Python to Go source code transcompiler and runtime. (by grumpyhome)

    So if that tool can read it, why not use it for conversion (calling from Go if you prefer)? Or have a look at the source to determine what it does to make the text readable. See also https://github.com/grumpyhome/grumpy

  • WorkOS

    The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • qpdf

    QPDF: A content-preserving PDF document transformer

    UPDATE: We tried repairing the pdf in question and lo and behold, we got a result. As a tool for the repair we used qpdf (https://github.com/qpdf/qpdf/releases), after that the ledongthuc/pdf library had no hassle reading the data.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts