Python Specific Formats Processing

Open-source Python projects categorized as Specific Formats Processing

Top 23 Python Specific Formats Processing Projects

  • PyPDF2

    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

  • Project mention: Yara scanning PDF files | /r/computerforensics | 2023-06-01
  • WeasyPrint

    The awesome document factory

  • Project mention: Launch HN: Onedoc (YC W24) – A better way to create PDFs | news.ycombinator.com | 2024-03-11

    Is there a reason you didn't consider something like Weasyprint?

    https://weasyprint.org

    I've gone through a number of systems to convert CV's, business cards, and other docs and it hasn't let me down yet.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • csvkit

    A suite of utilities for converting to and working with CSV, the king of tabular file formats.

  • PDFMiner

    Python PDF Parser (Not actively maintained). Check out pdfminer.six.

  • tablib

    Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.

  • python-docx

    Create and modify Word documents with Python

  • Project mention: What Would Go in Your Dream Documentation Solution? | /r/technicalwriting | 2023-12-09

    So, what I'd like to do is write a documentation package in Python to recreate what I've lost. I plan to build upon the fantastic python-docx and docxtpl packages, and I'll probably rely on pandas from much of the tabular stuff. Here are the features I intend to include:

  • PyMuPDF

    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

  • Project mention: FLaNK Stack for 04 December 2023 | dev.to | 2023-12-04
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Python-Markdown

    A Python implementation of John Gruber’s Markdown with Extension support.

  • Project mention: Introducing AutoPyTabs: Automatically generate code examples for different Python versions in MkDocs or Sphinx based documentations | /r/Python | 2023-04-30

    AutoPyTabs allows you to write code examples in your documentation targeting a single version of Python and then generates examples targeting higher Python versions on the fly, presenting them in tabs, using popular tabs extensions. This all comes packaged as a markdown extension, MkDocs plugin and a Sphinx, so it can easily be integrated with your documentation workflow.

  • XlsxWriter

    A Python module for creating Excel XLSX files.

  • borb

    borb is a library for reading, creating and manipulating PDF files in python.

  • Project mention: Caffè Italia * 30/04/23 | /r/italy | 2023-04-30
  • xlwings

    xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.

  • Project mention: Python in Excel: Combining the Power of Python and the Flexibility of Excel | news.ycombinator.com | 2023-08-23

    Reading the headline, I initially thought that Microsoft bought the company behind XLWings [1], which also enables you to use Excel directly within Excel, even locally. Not affiliated in any kind to that company, just used it in the past.

    [1] https://www.xlwings.org/

  • Camelot

    A Python library to extract tabular data from PDFs

  • Project mention: Show HN: How do you OCR on a Mac using the CLI or just Python for free | news.ycombinator.com | 2024-01-02

    I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)

  • markdown2

    markdown2: A fast and complete implementation of Markdown in Python

  • unoconv

    Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.

  • Mistune

    A fast yet powerful Python Markdown parser with renderers and plugins.

  • pdftabextract

    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

  • python-pptx

    Create Open XML PowerPoint documents in Python

  • docxtpl

    Use a docx as a jinja2 template

  • pyexcel

    Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

  • Project mention: Advice on ETL and Data Sharing work process | /r/ETL | 2023-11-07

    You could try and write some simple python using the pyexcel and pandas libraries. I created a tool as a consultant with these packages that parsed spreadsheets with data from factories from all around the world. They did not lock down the Excel files used to submit data and it made it so much harder. If you go this route, I would recommend starting by putting your data into a SQLite database. Once you have your data in a database, you unlock the power of SQL for pulling reports. Also, you can port the data into a proper database if you ever need to. ChatGPT can probably get you a good chunk of the way there.

  • pymorphy2

    Morphological analyzer / inflection engine for Russian and Ukrainian languages.

  • Project mention: Determine russian sentence parts. | /r/russian | 2023-05-11
  • mistletoe

    A fast, extensible and spec-compliant Markdown parser in pure Python.

  • unp

    Unpacks things.

  • vcspull

    🔄 Synchronize projects via yaml/json manifest. Built using `libvcs`.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Specific Formats Processing related posts

Index

What are some of the best open-source Specific Formats Processing projects in Python? This list will help you:

Project Stars
1 PyPDF2 7,359
2 WeasyPrint 6,635
3 csvkit 5,808
4 PDFMiner 5,179
5 tablib 4,524
6 python-docx 4,179
7 PyMuPDF 4,002
8 Python-Markdown 3,578
9 XlsxWriter 3,487
10 borb 3,283
11 xlwings 2,834
12 Camelot 2,631
13 markdown2 2,583
14 unoconv 2,514
15 Mistune 2,442
16 pdftabextract 2,152
17 python-pptx 2,150
18 docxtpl 1,848
19 pyexcel 1,173
20 pymorphy2 1,098
21 mistletoe 746
22 unp 416
23 vcspull 202

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com