Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 13 Python pdf-converter Projects
-
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
The system they tested are mostly used as a part of a larger system. A more fair comparison would be to use something like MinerU [1] and proper benchmark like the OHR Bench and Reductos table bench. This paper is really bad...
[1]: https://github.com/opendatalab/MinerU
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
-
Project mention: Mutool – all purpose tool for dealing with PDF files | news.ycombinator.com | 2025-02-02
While surfing around their org I found this which has an impressive looking sample: https://github.com/ArtifexSoftware/pdf2docx#sample
Also, strictly speaking their GitHub is labeled as a mirror, although they didn't say "mirror of what" but I believe it's this https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/
-
-
❄️ Apache Polaris + Iceberg Quickstart ⚡️ How to extract tables from pdfs 🚀 Microsoft 1bit LLM BitNet 🐿️ Verifying Kafka Transactions Entry 2 🐿️ FLUSS: Streaming Storage 🐿️ Fluss -> Flow for Flink Real Time Analytics 🌐 TableFlow - iceberg / kafka ❄️ Snowflake Cortex AI + Slack 🐿️❄️ Door dash flink, kafka, snowflake 🧠 Prompt Stack -- all in one 🔌 SpaCY Layout for PDF 📱 Responsible AI Pathways 📼 Megaparse documents python 🔌 Time Series LLM ❄️ Generate Synthetic Data in Snowflake 🐿️ LLMs and GenAI - When to use them 🐿️ Flink Observability with Prometheus 📡 New SQL GUI 🍫 TDD for GenAI 🕵️ 🎁 Open Source Agent Framework for Production 💻 Cedit command line editor 🏭 ServiceNow AgentLab 🎤 Snowflake Lessons Learned in Replication 🎄 Privastead 🔌 Backup Icloud with nodejs on linux 🔌 Backup Google with nodejs on linux 🎄 HuggingFace macos chat source code 🎁 Ollama working with structured output 🎁 dspy ai how to 🔌 Piazza updater 🔌 Building a financial report with langgraph ColPali Notebook with QWEN 2 VL
-
-
remarks
Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
stapler
A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk
-
pdf2epub
Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection using AI. (by overcuriousity)
Project mention: Convert PDF files to nicely structured Markdown and ePub with layout detection | news.ycombinator.com | 2025-02-18 -
Project mention: Show HN: Pdf2csv – Convert PDF Tables to CSV with CLI and Python API | news.ycombinator.com | 2025-01-06
-
-
comicreader
Comicreader with Comicvine API integration, WEBP-converter, CBZ/CBR and PDF support (by billylarsson)
-
Python pdf-converter discussion
Python pdf-converter related posts
-
Mutool – all purpose tool for dealing with PDF files
-
How is the PDF reading experience after 3.4 update?
-
What is your workflow for annotating PDFs on your Remarkable?
-
Caffè Italia * 30/04/23
-
Saving highlights from ebooks
-
Borb: the open source PDF engine
-
Tensorflow PDF Extraction
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 25 Apr 2025
Index
What are some of the best open-source pdf-converter projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | MinerU | 31,413 |
2 | borb | 3,468 |
3 | pdf2docx | 2,885 |
4 | xhtml2pdf | 2,299 |
5 | spacy-layout | 554 |
6 | pdfCropMargins | 381 |
7 | remarks | 370 |
8 | stapler | 288 |
9 | pdf2epub | 35 |
10 | pdf2csv | 26 |
11 | PDFtoTXT | 6 |
12 | comicreader | 5 |
13 | FastPDF Service API (Python) | 0 |