pdf2docx
stapler
pdf2docx | stapler | |
---|---|---|
6 | 3 | |
2,155 | 281 | |
3.9% | - | |
7.9 | 0.0 | |
21 days ago | 10 months ago | |
Python | Python | |
GNU Affero General Public License v3.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pdf2docx
-
Tensorflow PDF Extraction
Try pdf2docx. Here is the source: https://github.com/dothinking/pdf2docx.
-
Show HN: Doc Converter – Convert PDF docs to Word documents on your computer
Does it include its source/dependency licensing post extraction? Some of these dependencies are under GPL/AGPL https://github.com/dothinking/pdf2docx/blob/master/LICENSE
-
What should exist, but doesn’t?
If you'd rather convert a PDF to .docx so you can easily edit it, there's a free Python tool that works, but it has no GUI: https://github.com/dothinking/pdf2docx
-
How to deploy containerized Python and Django application on Heroku
pdf2docx: This module helps to convert from pdf to docx
-
Help with pictures in python-docx
I found this post on github https://github.com/dothinking/pdf2docx/issues/54#issuecomment-715925252
stapler
-
Google reverses 5M file limit in Google Drive
... That happens to be a Python alternative to PDFtk[1]. It’s been kind of abandoned in recent years, though.
Also, in the early times of OLE 2.0 MS Office included a utility called Binder that could put several Office documents in a single file and edit them using a common interface. Nothing came of it.
[1] https://github.com/hellerbarde/stapler
-
Papermerge (almost) 2.0 is out!
In UI you can cut pages from one document and pasted those pages into another document. Afterwards you can sort/reorder pages. Up until version 2.0 Papermerge was using pdftk for "cut" and "paste" operations. Because of pdftk licensing (plus its dependency on java) - it was replaced by stapler which is pure python equivalent of pdftk. Stapler is BSD licensed.
What are some alternatives?
django-convert-doc-to-pdf
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
borb - borb is a library for reading, creating and manipulating PDF files in python.
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
pdfsam - PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
Papermerge - Open Source Document Management System for Digital Archives (Scanned Documents)
pdf2docxConverter-PayalSasmal - This project is for converting pdf to docx and vise versa
pagelabels-py - Python library to manipulate PDF page labels
Django - The Web framework for perfectionists with deadlines.
Paperless - Scan, index, and archive all of your paper documents
gunicorn - gunicorn 'Green Unicorn' is a WSGI HTTP Server for UNIX, fast clients and sleepy applications.
pdftools - A collection of PDF command line tools and wrappers for Linux