Densify
OCRmyPDF
Densify | OCRmyPDF | |
---|---|---|
6 | 77 | |
82 | 12,134 | |
- | 3.2% | |
0.0 | 9.5 | |
16 days ago | 5 days ago | |
Python | Python | |
MIT License | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Densify
-
Document Scanner 42: Post processing scripts to compress PDFs?
The tutorial mentions Densify. It is GUI front-end for GS, and an interesting tool. The AppImage works well, but it is much slower than running GS from the commandline. https://github.com/hkdb/Densify
-
PDF file SPEED optimization.
I thought there may be optimization potential in your PDF file, so I downloaded a version from the Internet (377MB, 1493 pages) and ran it through ghostscript with the parameters the tool Densify uses:
-
How to make sh script stop and print error or continue/skip one step if a directory is already there.
sudo pamac -S rclone fd firejail keepassxc thunderbird terminator vim simple-scan sane-airscan xsane sane wireguard-tools gpg-crypter htop bmon ghostscript digikam borg libreoffice-fresh ncdu bleachbit rmlint displaycal avahi nss-mdns gnome-keyring kleopatra luminancehdr grep vivaldi vivaldi-ffmpeg-codecs discord rsync rkhunter jack2 audacity clementine p7zip ghostscript gimp gzip handbrake kcm-wacomtablet kdiff3 kid3 krusader kwallet-pam kwalletmanager luminancehdr mpv obs-studio vlc rawtherapee signal-desktop simple-scan skanlite speedtest-cli syncthing tree ttf-bitstream-vera ttf-dejavu ttf-liberation ttf-opensans unrar unzip veracrypt yakuake qemu-emulators-full qemu-full qbittorrent psensor pv python python-capng python-defusedxml python-llfuse python-packaging python-pip python-pyqt5 pass blender inkscape curl jq fuse2 fuse3 fuse-common libwacom zsh evolution hunspell hunspell-en_us aspell aspell-en kmail kontact kaddressbook korganizer kdepim-addons echo "pamac installations done" ################################################################################ ### Install AUR with yay packages yay ffmpg-amd-full-git yay corectrl yay wireguard-dkms yay timeshift yay timeshift-autosnap yay appimagelauncher yay balena-etcher yay brave-bin yay profile-sync-daemon-brave yay google-earth-pro yay stellarium yay pyfuse3 yay libwacom yay kcm-wacomtablet yay wacom-utility-git yay brother-cups-wrapper-common yay brother-dcpl2550dw yay brother-lpr-drivers-common yay input-wacom-dkms yay mailspring yay obs-v4l2sink-git yay python-pyfuse3 yay rdfind yay reload-wacom-after-suspend echo "yay - AUR installations done" ################################################################################ ### Download AppImages to /home/$USER/Downloads/AppImages-Dwnld # Librewolf wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://gitlab.com/api/v4/projects/24386000/packages/generic/librewolf/101.0.1-1/LibreWolf.x86_64.AppImage echo "Librewolf dwnld done" # Tutanota wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://mail.tutanota.com/desktop/tutanota-desktop-linux.AppImage echo "Tutanota dwnld done" # Joplin app wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://github.com/laurent22/joplin/releases/download/v2.8.8/Joplin-2.8.8.AppImage echo "Joplin-app dwnld done" # Densify pdf compressor wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://github.com/hkdb/Densify/releases/download/v0.3.1/Densify-v0.3.1-x86_64.AppImage echo "Densify dwnld done" # fre:ac - music converter wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://github.com/enzo1982/freac/releases/download/v1.1.6/freac-1.1.6-linux-x86_64.AppImage echo "fre:ac dwnld done" # Notion-enhanced - markdown notetaking app wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://github.com/notion-enhancer/notion-repackaged/releases/download/v2.0.18-1/Notion-Enhanced-2.0.18-1.AppImage echo "Notion-enhanced dwnld done" # Obsidian -- markdown txt editor wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://github.com/obsidianmd/obsidian-releases/releases/download/v0.14.15/Obsidian-0.14.15.AppImage echo "Obsidian dwnld done" # Simple note wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://github.com/Automattic/simplenote-electron/releases/download/v2.21.0/Simplenote-linux-2.21.0-x86_64.AppImage echo "Simple-note dwnld done" # Siril - atstro photo editor wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://free-astro.org/download/Siril-1.0.3-x86_64.AppImage echo "Siril dwnld done" # Exodus - crypto wallet wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://downloads.exodus.com/releases/exodus-linux-x64-22.6.17.zip echo "Exodus dwnld done" # Teamviewer wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://download.teamviewer.com/download/linux/teamviewer_amd64.tar.xz echo "Teamviewer dwnld done" # Tor-Browser - torproject.org wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://www.torproject.org/dist/torbrowser/11.0.14/tor-browser-linux64-11.0.14_en-US.tar.xz echo "Tor-Browser dwnld done" # XDM - xtreme download manager wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://github.com/subhra74/xdm/releases/download/7.2.11/xdm-setup-7.2.11.tar.xz echo "XDM dwnld done" # startools - astro photo editing program (paid) wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://download.startools.org/StarTools_1_8_527_MR2.zip echo "Startools dwnld done" # PDF-studio - Linux pdf editor that I have pro verison of wget -nc -P /home/$USER/Downloads/AppImages-dwnld https://download.qoppa.com/pdfstudio/PDFStudio_linux64.sh echo "PDF-Studio dwnld done" echo "All AppImage Downloads Compleated" ################################################################################
-
Alternative to Adobe Acrobat for PDF
Densify - GUI for compressing PDF file sizes. I recommend the epub setting.
-
PDF Compression software
check out densify - github and article
OCRmyPDF
-
TextSnatcher: Copy text from images, for the Linux Desktop
Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.
- FLaNK Stack Weekly 19 Feb 2024
-
Calibre – New in Calibre 7.0
I recommend running any such PDFs through OCRmyPDF.
https://github.com/ocrmypdf/OCRmyPDF
-
A better document viewer
If by "like a photocopy" you mean the file contains images of text rather than text, the MacOS viewer presumably does OCR on the images. I don't know if there's a Linux document viewer with that capability built-in, but a quick search turned up the standalone tool OCRmyPDF.
- Gibts ein (CLI) tool, das Kontrast und Helligkeit von gescannten Textdokumenten dynamisch anpasst?
-
OCR for a full pdf on Neoreader
For anyone interested I solved the problem by first ocr files through the free and open source software ocrmypdf avaible here
-
ELI5: why is PDF such a widespread text format, instead of a format that's actually easier to edit?
ocrmypdf is nice for stuff like that.
- Donut: OCR-Free Document Understanding Transformer
-
massive crop and OCR newspaper
Use imagemagick to convert them to PDF and ocrmypdf to straighten and OCR. See this explanation.
-
OCR pdf and just keep the OCR text
Fair enough, maybe this might work for you, it should seperate the text from image anyway and if you have Adobe acrobat it should be able delete the background too with the edit function. It may already be able to do that if you haven't tried it
What are some alternatives?
cpdf - A script to simplify compressing PDF file size with GhostScript
PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
notion-repackaged - notion executables with the notion-enhancer embedded & a vanilla port of the official app to linux
pdfplumber - Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
simplenote-electron - Simplenote for Web, Windows, and Linux
tesserocr - A Python wrapper for the tesseract-ocr API
xdm - Powerfull download accelerator and video downloader
Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents
freac - The fre:ac audio converter project
invoice2data - Extract structured data from PDF invoices
v4
pdfminer.six - Community maintained fork of pdfminer - we fathom PDF