Pdfgrep – a commandline utility to search text in PDF files

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • ripgrep-all

    rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

  • pdf-keywords-extractor

  • Tangential:

    Some time ago I built an automation [1] that automatically identifies whether the given PDFs contain the specified keywords, outputting the result as a CSV file.

    Similar to PDFGrep, probably much slower, but potentially more convenient for people preferring GUIs

    [1] https://github.com/bendersej/pdf-keywords-extractor

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • pdfgrep

  • Looking at the list of dependencies, it seems like they use poppler-cpp to render the PDFs.

    https://gitlab.com/pdfgrep/pdfgrep#dependencies

  • docquery

    An easy way to extract information from documents

  • DocQuery (https://github.com/impira/docquery), a project I work on, allows you to do something similar, but search over semantic information in the PDF files (using a large language model that is pre-trained to query business documents).

    For example:

      $ docquery scan "What is the due date?" /my/invoices/

  • pdfgrep

    PDFGrep is a GNU/Emacs module providing grep comparable facilities but for PDF files

  • For Emacs users there is also https://github.com/jeremy-compostella/pdfgrep which lets you browse the results and open the original docs highlighting the selected match.

  • looqs

    FTS desktop file search with previews

  • I am working on looqs, it can do that (and also will render the page immediatly): https://github.com/quitesimpleorg/looqs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts