InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
pdf2htmlEX Alternatives
Similar projects and alternatives to pdf2htmlEX
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
koreader
An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
-
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
-
-
-
-
wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, etc
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
-
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
-
document-ai-samples
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
-
markdown-themeable-pdf
Discontinued ARCHIVED. NOT MAINTAINED. Themeable Markdown Converter (Print to PDF, HTML, JPEG or PNG)
-
url-to-pdf-api
Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
-
-
-
-
-
AI-in-a-Box
AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
pdf2htmlEX discussion
pdf2htmlEX reviews and mentions
-
Ask HN: What are you using to parse PDFs for RAG?
Previously have used https://github.com/pdf2htmlEX/pdf2htmlEX to convert PDF to HTML at scale, could potentially try and parse the output html to markdown as second stage.
- Suggestion for analyzing the Blackvault files with ChatGPT
- Show HN: Paper to HTML Converter
-
Changing PDF to word with the same layout
Check out pdf2htmlEX
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 May 2025
Stats
pdf2htmlEX/pdf2htmlEX is an open source project licensed under GNU General Public License v3.0 or later which is an OSI approved license.
The primary programming language of pdf2htmlEX is HTML.