Python Specific Formats Processing

Open-source Python projects categorized as Specific Formats Processing

Top 23 Python Specific Formats Processing Projects

Specific Formats Processing
  1. PyPDF2

    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

    Project mention: Using Docling’s OCR features with RapidOCR | dev.to | 2025-04-03
  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. WeasyPrint

    The awesome document factory

    Project mention: Quarkdown: A modern Markdown-based typesetting system | news.ycombinator.com | 2025-06-03
  4. PyMuPDF

    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

    Project mention: Using Docling’s OCR features with RapidOCR | dev.to | 2025-04-03
  5. csvkit

    A suite of utilities for converting to and working with CSV, the king of tabular file formats.

    Project mention: Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files | news.ycombinator.com | 2025-05-26

    I wonder how this compares to csvkit [1].

    [1]: https://csvkit.readthedocs.io/

  6. python-docx

    Create and modify Word documents with Python

  7. tablib

    Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.

    Project mention: Rust solves the problem of incomplete Kernel Linux API docS | news.ycombinator.com | 2024-08-31
  8. Python-Markdown

    A Python implementation of John Gruber’s Markdown with Extension support.

    Project mention: Show HN: Python) Markdown Exec, execute code blocks and render their output | news.ycombinator.com | 2024-06-15

    Hey everyone, here's an extension I made for Python-Markdown (https://github.com/Python-Markdown/markdown). It builds on top of PyMDown Extensions' SuperFences (https://facelessuser.github.io/pymdown-extensions/extensions...), and allows Markdown writers to execute their Markdown code blocks to render the execution output in place of / in addition to the code blocks.

    Languages supported:

    - python/pycon

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. XlsxWriter

    A Python module for creating Excel XLSX files.

    Project mention: 7 Python Excel Libraries: In-Depth Review for Developers | dev.to | 2024-07-18

    XlsxWriter is a Python library for creating Excel 2007 xlsx files. It is particularly well-suited for writing complex formulas and creating sophisticated charts.

  11. borb

    borb is a library for reading, creating and manipulating PDF files in python.

  12. Camelot

    A Python library to extract tabular data from PDFs

  13. xlwings

    xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.

  14. python-pptx

    Create Open XML PowerPoint documents in Python

  15. Mistune

    A fast yet powerful Python Markdown parser with renderers and plugins.

  16. markdown2

    markdown2: A fast and complete implementation of Markdown in Python

  17. pdftabextract

    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

    Project mention: Liberate tabular data from scanned documents | news.ycombinator.com | 2024-12-18
  18. docxtpl

    Use a docx as a jinja2 template

    Project mention: Show HN: AutoDocument – Multi-Source Document Generation | news.ycombinator.com | 2024-08-08

    Hi there, this post is introducing AutoDocument, a free and open-source document generating web app that connects spreadsheets, databases and user forms into documents such as Microsoft Word and PDFs. It's based on fantastic open sources libraries like https://github.com/elapouya/python-docx-template and headless LibreOffice.

    Mail Merge is a pain because it:

    - Only converts from Excel to Word

  19. kreuzberg

    A text extraction library supporting PDFs, images, office documents and more

    Project mention: Thiw Week In Python | dev.to | 2025-03-28

    kreuzberg – text extraction library supporting PDFs, images, office documents and more

  20. pyexcel

    Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

  21. pymorphy2

    Morphological analyzer / inflection engine for Russian and Ukrainian languages.

  22. mistletoe

    A fast, extensible and spec-compliant Markdown parser in pure Python.

  23. unp

    Unpacks things.

  24. vcspull

    🔄 Synchronize projects via yaml/json manifest. Built using `libvcs`.

  25. Marmir

    Python powered spreadsheets (by brianray)

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Specific Formats Processing discussion

Log in or Post with

Python Specific Formats Processing related posts

  • Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files

    2 projects | news.ycombinator.com | 26 May 2025
  • Using Docling’s OCR features with RapidOCR

    9 projects | dev.to | 3 Apr 2025
  • Show HN: Kreuzberg v3.0 – Modern Python Document Extraction

    1 project | news.ycombinator.com | 24 Mar 2025
  • Interest in a pgvector-based RAG system library?

    1 project | news.ycombinator.com | 15 Mar 2025
  • Converting Plotly charts into images in parallel

    2 projects | dev.to | 2 Jan 2025
  • Liberate tabular data from scanned documents

    1 project | news.ycombinator.com | 18 Dec 2024
  • Using Pandoc and Typst to Produce PDFs

    1 project | news.ycombinator.com | 29 Nov 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 14 Jun 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Specific Formats Processing projects in Python? This list will help you:

# Project Stars
1 PyPDF2 9,138
2 WeasyPrint 7,789
3 PyMuPDF 7,326
4 csvkit 6,198
5 python-docx 5,058
6 tablib 4,700
7 Python-Markdown 4,001
8 XlsxWriter 3,790
9 borb 3,475
10 Camelot 3,323
11 xlwings 3,161
12 python-pptx 2,805
13 Mistune 2,800
14 markdown2 2,772
15 pdftabextract 2,232
16 docxtpl 2,216
17 kreuzberg 1,849
18 pyexcel 1,245
19 pymorphy2 1,152
20 mistletoe 926
21 unp 430
22 vcspull 207
23 Marmir 173

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?