Specific Formats Processing

Open-source projects categorized as Specific Formats Processing

Top 23 Specific Formats Processing Open-Source Projects

  • PyPDF2

    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

  • Project mention: Yara scanning PDF files | /r/computerforensics | 2023-06-01
  • WeasyPrint

    The awesome document factory

  • Project mention: Launch HN: Onedoc (YC W24) – A better way to create PDFs | news.ycombinator.com | 2024-03-11

    Is there a reason you didn't consider something like Weasyprint?

    https://weasyprint.org

    I've gone through a number of systems to convert CV's, business cards, and other docs and it hasn't let me down yet.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • csvkit

    A suite of utilities for converting to and working with CSV, the king of tabular file formats.

  • PDFMiner

    Python PDF Parser (Not actively maintained). Check out pdfminer.six.

  • tablib

    Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.

  • python-docx

    Create and modify Word documents with Python

  • Project mention: What Would Go in Your Dream Documentation Solution? | /r/technicalwriting | 2023-12-09

    So, what I'd like to do is write a documentation package in Python to recreate what I've lost. I plan to build upon the fantastic python-docx and docxtpl packages, and I'll probably rely on pandas from much of the tabular stuff. Here are the features I intend to include:

  • PyMuPDF

    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

  • Project mention: FLaNK Stack for 04 December 2023 | dev.to | 2023-12-04
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Python-Markdown

    A Python implementation of John Gruber’s Markdown with Extension support.

  • XlsxWriter

    A Python module for creating Excel XLSX files.

  • borb

    borb is a library for reading, creating and manipulating PDF files in python.

  • xlwings

    xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.

  • Project mention: Python in Excel: Combining the Power of Python and the Flexibility of Excel | news.ycombinator.com | 2023-08-23

    Reading the headline, I initially thought that Microsoft bought the company behind XLWings [1], which also enables you to use Excel directly within Excel, even locally. Not affiliated in any kind to that company, just used it in the past.

    [1] https://www.xlwings.org/

  • Camelot

    A Python library to extract tabular data from PDFs

  • Project mention: Show HN: How do you OCR on a Mac using the CLI or just Python for free | news.ycombinator.com | 2024-01-02

    I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)

  • markdown2

    markdown2: A fast and complete implementation of Markdown in Python

  • unoconv

    Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.

  • Mistune

    A fast yet powerful Python Markdown parser with renderers and plugins.

  • python-pptx

    Create Open XML PowerPoint documents in Python

  • pdftabextract

    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

  • docxtpl

    Use a docx as a jinja2 template

  • pyexcel

    Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

  • Project mention: Advice on ETL and Data Sharing work process | /r/ETL | 2023-11-07

    You could try and write some simple python using the pyexcel and pandas libraries. I created a tool as a consultant with these packages that parsed spreadsheets with data from factories from all around the world. They did not lock down the Excel files used to submit data and it made it so much harder. If you go this route, I would recommend starting by putting your data into a SQLite database. Once you have your data in a database, you unlock the power of SQL for pulling reports. Also, you can port the data into a proper database if you ever need to. ChatGPT can probably get you a good chunk of the way there.

  • pymorphy2

    Morphological analyzer / inflection engine for Russian and Ukrainian languages.

  • Project mention: Determine russian sentence parts. | /r/russian | 2023-05-11
  • mistletoe

    A fast, extensible and spec-compliant Markdown parser in pure Python.

  • unp

    Unpacks things.

  • vcspull

    🔄 Synchronize projects via yaml/json manifest. Built using `libvcs`.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Specific Formats Processing related posts

  • Show HN: A new open-source library to design PDF using React

    2 projects | news.ycombinator.com | 17 Feb 2024
  • 1.5M PDFs in 25 Minutes

    2 projects | news.ycombinator.com | 15 Feb 2024
  • Htmldocs: Typeset and Generate PDFs with HTML/CSS

    11 projects | news.ycombinator.com | 17 Jan 2024
  • Show HN: How do you OCR on a Mac using the CLI or just Python for free

    6 projects | news.ycombinator.com | 2 Jan 2024
  • How to Simply Generate a PDF From HTML in Symfony With WeasyPrint

    6 projects | dev.to | 12 Dec 2023
  • What Would Go in Your Dream Documentation Solution?

    2 projects | /r/technicalwriting | 9 Dec 2023
  • FLaNK Stack for 04 December 2023

    24 projects | dev.to | 4 Dec 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 10 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Specific Formats Processing projects? This list will help you:

Project Stars
1 PyPDF2 7,466
2 WeasyPrint 6,658
3 csvkit 5,829
4 PDFMiner 5,179
5 tablib 4,533
6 python-docx 4,222
7 PyMuPDF 4,103
8 Python-Markdown 3,593
9 XlsxWriter 3,499
10 borb 3,296
11 xlwings 2,848
12 Camelot 2,669
13 markdown2 2,588
14 unoconv 2,514
15 Mistune 2,456
16 python-pptx 2,179
17 pdftabextract 2,152
18 docxtpl 1,871
19 pyexcel 1,173
20 pymorphy2 1,101
21 mistletoe 754
22 unp 416
23 vcspull 203

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com