Is it possible (and how) to extract highlighted text from a PDF (via API)?

This page summarizes the projects mentioned and recommended in the original post on /r/CSEducation

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • pdftotext

    Simple PDF text extraction

  • I’ve used https://github.com/jalan/pdftotext for automated pdf text parsing before, but I wasn’t doing anything with highlighting and I’m not sure what the formatting for that looks like or if it would be simple or even present in the output of this library. You could give it a test run on one of your pdfs and see though. Best of luck; pdf text extraction is a nightmare, especially if they don’t all come from the exact same source and generation system, since they’re not really text documents.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • HMT: Hierarchical Memory Transformer for Long Context Language Processing

    4 projects | news.ycombinator.com | 17 May 2024
  • Hunyuan-DiT: Multi-Resolution Diffusion Transformer with Chinese Understanding

    1 project | news.ycombinator.com | 17 May 2024
  • Show HN: Django-import-export v4 is out

    1 project | news.ycombinator.com | 17 May 2024
  • From Zero to Hero: Disaster Recovery for PostgreSQL with Streaming Replication in Kubernetes

    1 project | dev.to | 17 May 2024
  • Show HN: Tabletop Handybot – A low-cost AI powered robotic arm assistant

    1 project | news.ycombinator.com | 17 May 2024