SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Specific Formats Processing Projects
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
csvkit
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
-
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
xlwings
xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.
-
unoconv
Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Launch HN: Onedoc (YC W24) – A better way to create PDFs | news.ycombinator.com | 2024-03-11Is there a reason you didn't consider something like Weasyprint?
https://weasyprint.org
I've gone through a number of systems to convert CV's, business cards, and other docs and it hasn't let me down yet.
Project mention: What Would Go in Your Dream Documentation Solution? | /r/technicalwriting | 2023-12-09So, what I'd like to do is write a documentation package in Python to recreate what I've lost. I plan to build upon the fantastic python-docx and docxtpl packages, and I'll probably rely on pandas from much of the tabular stuff. Here are the features I intend to include:
Project mention: Introducing AutoPyTabs: Automatically generate code examples for different Python versions in MkDocs or Sphinx based documentations | /r/Python | 2023-04-30AutoPyTabs allows you to write code examples in your documentation targeting a single version of Python and then generates examples targeting higher Python versions on the fly, presenting them in tabs, using popular tabs extensions. This all comes packaged as a markdown extension, MkDocs plugin and a Sphinx, so it can easily be integrated with your documentation workflow.
Project mention: Python in Excel: Combining the Power of Python and the Flexibility of Excel | news.ycombinator.com | 2023-08-23Reading the headline, I initially thought that Microsoft bought the company behind XLWings [1], which also enables you to use Excel directly within Excel, even locally. Not affiliated in any kind to that company, just used it in the past.
[1] https://www.xlwings.org/
Project mention: Show HN: How do you OCR on a Mac using the CLI or just Python for free | news.ycombinator.com | 2024-01-02I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)
You could try and write some simple python using the pyexcel and pandas libraries. I created a tool as a consultant with these packages that parsed spreadsheets with data from factories from all around the world. They did not lock down the Excel files used to submit data and it made it so much harder. If you go this route, I would recommend starting by putting your data into a SQLite database. Once you have your data in a database, you unlock the power of SQL for pulling reports. Also, you can port the data into a proper database if you ever need to. ChatGPT can probably get you a good chunk of the way there.
Python Specific Formats Processing related posts
- Show HN: A new open-source library to design PDF using React
- 1.5M PDFs in 25 Minutes
- Htmldocs: Typeset and Generate PDFs with HTML/CSS
- Show HN: How do you OCR on a Mac using the CLI or just Python for free
- What Would Go in Your Dream Documentation Solution?
- Advice on ETL and Data Sharing work process
- Converting markdown to pdf in Python
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024
Index
What are some of the best open-source Specific Formats Processing projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | PyPDF2 | 7,359 |
2 | WeasyPrint | 6,635 |
3 | csvkit | 5,808 |
4 | PDFMiner | 5,179 |
5 | tablib | 4,524 |
6 | python-docx | 4,179 |
7 | PyMuPDF | 4,002 |
8 | Python-Markdown | 3,578 |
9 | XlsxWriter | 3,487 |
10 | borb | 3,283 |
11 | xlwings | 2,834 |
12 | Camelot | 2,631 |
13 | markdown2 | 2,583 |
14 | unoconv | 2,514 |
15 | Mistune | 2,442 |
16 | pdftabextract | 2,152 |
17 | python-pptx | 2,150 |
18 | docxtpl | 1,848 |
19 | pyexcel | 1,173 |
20 | pymorphy2 | 1,098 |
21 | mistletoe | 746 |
22 | unp | 416 |
23 | vcspull | 202 |
Sponsored