Python pdf-converter

Open-source Python projects categorized as pdf-converter

Top 13 Python pdf-converter Projects

pdf-converter
  1. MinerU

    A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

    Project mention: Gemini beats everyone on new OCR benchmark | news.ycombinator.com | 2025-02-14

    The system they tested are mostly used as a part of a larger system. A more fair comparison would be to use something like MinerU [1] and proper benchmark like the OHR Bench and Reductos table bench. This paper is really bad...

    [1]: https://github.com/opendatalab/MinerU

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. borb

    borb is a library for reading, creating and manipulating PDF files in python.

  4. pdf2docx

    Open source Python library for converting PDF to DOCX.

    Project mention: Mutool – all purpose tool for dealing with PDF files | news.ycombinator.com | 2025-02-02

    While surfing around their org I found this which has an impressive looking sample: https://github.com/ArtifexSoftware/pdf2docx#sample

    Also, strictly speaking their GitHub is labeled as a mirror, although they didn't say "mirror of what" but I believe it's this https://cgit.ghostscript.com/cgi-bin/cgit.cgi/mupdf.git/

  5. xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

  6. spacy-layout

    📚 Process PDFs, Word documents and more with spaCy

    Project mention: AI and All Data Weekly for 09 Dec 2024 | dev.to | 2024-12-09

    ❄️ Apache Polaris + Iceberg Quickstart ⚡️ How to extract tables from pdfs 🚀 Microsoft 1bit LLM BitNet 🐿️ Verifying Kafka Transactions Entry 2 🐿️ FLUSS: Streaming Storage 🐿️ Fluss -> Flow for Flink Real Time Analytics 🌐 TableFlow - iceberg / kafka ❄️ Snowflake Cortex AI + Slack 🐿️❄️ Door dash flink, kafka, snowflake 🧠 Prompt Stack -- all in one 🔌 SpaCY Layout for PDF 📱 Responsible AI Pathways 📼 Megaparse documents python 🔌 Time Series LLM ❄️ Generate Synthetic Data in Snowflake 🐿️ LLMs and GenAI - When to use them 🐿️ Flink Observability with Prometheus 📡 New SQL GUI 🍫 TDD for GenAI 🕵️ 🎁 Open Source Agent Framework for Production 💻 Cedit command line editor 🏭 ServiceNow AgentLab 🎤 Snowflake Lessons Learned in Replication 🎄 Privastead 🔌 Backup Icloud with nodejs on linux 🔌 Backup Google with nodejs on linux 🎄 HuggingFace macos chat source code 🎁 Ollama working with structured output 🎁 dspy ai how to 🔌 Piazza updater 🔌 Building a financial report with langgraph ColPali Notebook with QWEN 2 VL

  7. pdfCropMargins

    pdfCropMargins -- a program to crop the margins of PDF files

  8. remarks

    Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  10. stapler

    A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk

  11. pdf2epub

    Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection using AI. (by overcuriousity)

    Project mention: Convert PDF files to nicely structured Markdown and ePub with layout detection | news.ycombinator.com | 2025-02-18
  12. pdf2csv

    A python library and CLI tool to convert PDF files to CSV files.

    Project mention: Show HN: Pdf2csv – Convert PDF Tables to CSV with CLI and Python API | news.ycombinator.com | 2025-01-06
  13. PDFtoTXT

    Converts any PDF file from one language into your language

  14. comicreader

    Comicreader with Comicvine API integration, WEBP-converter, CBZ/CBR and PDF support (by billylarsson)

  15. FastPDF Service API (Python)

    Python SDK for Fast PDF Service

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python pdf-converter discussion

Log in or Post with

Python pdf-converter related posts

  • Mutool – all purpose tool for dealing with PDF files

    2 projects | news.ycombinator.com | 2 Feb 2025
  • How is the PDF reading experience after 3.4 update?

    1 project | /r/RemarkableTablet | 8 Jun 2023
  • What is your workflow for annotating PDFs on your Remarkable?

    1 project | /r/RemarkableTablet | 18 May 2023
  • Caffè Italia * 30/04/23

    1 project | /r/italy | 30 Apr 2023
  • Saving highlights from ebooks

    1 project | /r/RemarkableTablet | 6 Apr 2023
  • Borb: the open source PDF engine

    1 project | /r/SideProject | 1 Apr 2023
  • Tensorflow PDF Extraction

    1 project | /r/tensorflow | 26 Feb 2023
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 25 Apr 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source pdf-converter projects in Python? This list will help you:

# Project Stars
1 MinerU 31,413
2 borb 3,468
3 pdf2docx 2,885
4 xhtml2pdf 2,299
5 spacy-layout 554
6 pdfCropMargins 381
7 remarks 370
8 stapler 288
9 pdf2epub 35
10 pdf2csv 26
11 PDFtoTXT 6
12 comicreader 5
13 FastPDF Service API (Python) 0

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com