how do I automate extracting data from two pdfs and input into an excel sheet according to an order number

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • pdfminer.six

    Community maintained fork of pdfminer - we fathom PDF

  • Entering things in Excel is very easy. Extracting things from PDF is a pain. This (https://github.com/pdfminer/pdfminer.six) gets pretty close to what you need, but it may be easier to use this to just convert the entire PDF to text and parse the text to extract the info you need.

  • pdfplumber

    Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

  • pdfplumber is also pretty good. It can help segment text a bit better than pdfminer can alone.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts