SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Specific Formats Processing Projects
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Project mention: Quarkdown: A modern Markdown-based typesetting system | news.ycombinator.com | 2025-06-03
-
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
csvkit
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Project mention: Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files | news.ycombinator.com | 2025-05-26I wonder how this compares to csvkit [1].
[1]: https://csvkit.readthedocs.io/
-
-
Project mention: Rust solves the problem of incomplete Kernel Linux API docS | news.ycombinator.com | 2024-08-31
-
Project mention: Show HN: Python) Markdown Exec, execute code blocks and render their output | news.ycombinator.com | 2024-06-15
Hey everyone, here's an extension I made for Python-Markdown (https://github.com/Python-Markdown/markdown). It builds on top of PyMDown Extensions' SuperFences (https://facelessuser.github.io/pymdown-extensions/extensions...), and allows Markdown writers to execute their Markdown code blocks to render the execution output in place of / in addition to the code blocks.
Languages supported:
- python/pycon
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
XlsxWriter is a Python library for creating Excel 2007 xlsx files. It is particularly well-suited for writing complex formulas and creating sophisticated charts.
-
-
-
xlwings
xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.
-
-
-
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
Project mention: Show HN: AutoDocument – Multi-Source Document Generation | news.ycombinator.com | 2024-08-08
Hi there, this post is introducing AutoDocument, a free and open-source document generating web app that connects spreadsheets, databases and user forms into documents such as Microsoft Word and PDFs. It's based on fantastic open sources libraries like https://github.com/elapouya/python-docx-template and headless LibreOffice.
Mail Merge is a pain because it:
- Only converts from Excel to Word
-
kreuzberg – text extraction library supporting PDFs, images, office documents and more
-
-
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Specific Formats Processing discussion
Python Specific Formats Processing related posts
-
Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files
-
Using Docling’s OCR features with RapidOCR
-
Show HN: Kreuzberg v3.0 – Modern Python Document Extraction
-
Interest in a pgvector-based RAG system library?
-
Converting Plotly charts into images in parallel
-
Liberate tabular data from scanned documents
-
Using Pandoc and Typst to Produce PDFs
-
A note from our sponsor - SaaSHub
www.saashub.com | 14 Jun 2025
Index
What are some of the best open-source Specific Formats Processing projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | PyPDF2 | 9,138 |
2 | WeasyPrint | 7,789 |
3 | PyMuPDF | 7,326 |
4 | csvkit | 6,198 |
5 | python-docx | 5,058 |
6 | tablib | 4,700 |
7 | Python-Markdown | 4,001 |
8 | XlsxWriter | 3,790 |
9 | borb | 3,475 |
10 | Camelot | 3,323 |
11 | xlwings | 3,161 |
12 | python-pptx | 2,805 |
13 | Mistune | 2,800 |
14 | markdown2 | 2,772 |
15 | pdftabextract | 2,232 |
16 | docxtpl | 2,216 |
17 | kreuzberg | 1,849 |
18 | pyexcel | 1,245 |
19 | pymorphy2 | 1,152 |
20 | mistletoe | 926 |
21 | unp | 430 |
22 | vcspull | 207 |
23 | Marmir | 173 |