Checkbox Extraction from PDFs - A Tutorial

This page summarizes the projects mentioned and recommended in the original post on dev.to

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io
featured
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
  1. llmwhisperer-pdf-checkbox-processing

    Demonstration and companion repo that shows how to process PDF form elements like checkboxes and radiobuttons with LLMWhisperer

    The source code for the project can be found here on GitHub. To successfully run the extraction script, you’ll need 2 different API keys. One for LLMWhisperer and the other for OpenAI APIs. Please be sure to read the Github project’s README to fully understand OS and other dependency requirements. You can sign up for LLMWhisperer, get your API key, and process up to 100 pages per day free of charge.

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. langchain

    🦜🔗 Build context-aware reasoning applications

    If you carefully think about it, the system that extracts raw text from the PDF needs to both detect and render PDF form elements like checkboxes and radiobuttons in a way that LLMs can understand. In this example, we’ll use LLMWhisperer to extract PDF raw text representing checkboxes and radiobuttons. You can use LLMWhisperer completely free for processing up to 100 pages per day. As for structuring the output from LLMWhisperer, we’ll use GPT3.5-Turbo and we’ll use Langchain and Pydantic to help make our job easy.

  4. pydantic

    Data validation using Python type hints

    If you carefully think about it, the system that extracts raw text from the PDF needs to both detect and render PDF form elements like checkboxes and radiobuttons in a way that LLMs can understand. In this example, we’ll use LLMWhisperer to extract PDF raw text representing checkboxes and radiobuttons. You can use LLMWhisperer completely free for processing up to 100 pages per day. As for structuring the output from LLMWhisperer, we’ll use GPT3.5-Turbo and we’ll use Langchain and Pydantic to help make our job easy.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Dict Unpacking in Python

    4 projects | news.ycombinator.com | 8 Jul 2025
  • A Practical Guide on Structuring LLM Outputs with Pydantic

    2 projects | dev.to | 12 Jun 2025
  • Advanced Pydantic: Generic Models, Custom Types, and Performance Tricks

    1 project | dev.to | 5 May 2025
  • Advanced RAG with guided generation

    2 projects | dev.to | 18 Apr 2024
  • Pydantic v2 ruined the elegance of Pydantic v1

    1 project | news.ycombinator.com | 28 Jan 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?