gImageReader vs percollate

gImageReader

A Gtk/Qt front-end to tesseract-ocr. (by manisandro)

Qt OCR pdf-document C++ tesseract-ocr Gtk hocr-documents Hocr Scanner

Source Code

Suggest alternative

Edit details

percollate

A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs. (by danburzo)

Puppeteer PDF Readability CLI Epub HTML Markdown

Source Code

danburzo.ro

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gImageReader		percollate
	Project
15	Mentions	14
1,519	Stars	4,108
-	Growth	-
7.8	Activity	5.9
28 days ago	Latest Commit	3 months ago
C++	Language	JavaScript
GNU General Public License v3.0 only	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

gImageReader

Posts with mentions or reviews of gImageReader. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-12.

Making an archive out of my grandfather's writings. What OCR scanning and doc mgt system to use?
4 projects | /r/selfhosted | 12 Jul 2023

On tesseract base here is a software to make a scan a text searchable pdf. It take a bit of time and can be a bit tedious but it does the work! https://github.com/manisandro/gImageReader/releases It does not work well on cursive writing of course. It's a bit less heavy code sided solution. Good luck!
Is there free software for windows that can read scanned handwriting and turn it into text?
1 project | /r/freesoftware | 4 May 2023
أحمل برنامج صخر منين؟ دورت عليه كتير مش لاقياه؟ ولو مش موجود حد يعرف أي بديل كويس بيعمل Arabic OCR؟
1 project | /r/arabs | 4 Apr 2023
Writer - Tips to remove breaks and hyphenations from PDF to DOC conversion?
1 project | /r/libreoffice | 25 Mar 2023

I'm working with old newspaper PDFs to convert them into DOC formats. I'm having a great time with gImageReader by highlighting columns and converting them to plain text. Then I take that plain text into Libreoffice Writer (7.0.4.2) to clean up and save. If this were a book as opposed to a newspaper with ads and columns, it would have bee a lot easier to convert and format.
Best OCR software for extracting pdf to txt - Paid or Free version.
6 projects | /r/software | 30 Oct 2022

It would help to know a bit more of your usecase. If you're looking to just extract the text (ie, take all the textual content of your PDF and drop it into a separate text document), there are solutions like ABBYY Finereader and gImageReader. If you're looking to make PDFs searchable (keeping the scanned pages, but adding a text layer underneath so you can search and copy from them), there's NAPS2 (which has an additional command line tool for automation) and OCRmyPDF.
Help plz! Tool to enhance pdf text quality?
2 projects | /r/opensource | 15 Oct 2022

OpenSource OCR... for desktop users I like "gImageReader" URL: https://github.com/manisandro/gImageReader (Technically is GUI for tessaract)
Good Open Source OCR software
1 project | /r/librarians | 15 Feb 2022

gImageReader is the linux standard that I'm aware of. It's a GUI to Tessaeract, but IIRC you can use other models if you have them.
What Are The Best Linux Apps?
25 projects | /r/linuxquestions | 5 Jan 2022

gImageReader as a simple OCR application
OCR Arabic screenshot clipboard captures for Mac
2 projects | /r/learn_arabic | 27 Oct 2021

https://github.com/manisandro/gImageReader ^^ seems like it has installers for different OS's
Is there a good/accurate OCR/Text to Image program available?
2 projects | /r/software | 21 Oct 2021

percollate

Posts with mentions or reviews of percollate. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-10-19.

The Case Against AI Everything, Everywhere, All at Once
2 projects | news.ycombinator.com | 19 Oct 2023

You can still choose automation. The easier route for me is to use wallabag to save the article. Then on my remarkable tablet I can grab a very readable document with https://github.com/koreader/koreader.
The other option is to use https://github.com/danburzo/percollate to convert a webpage to a nice document directly. I use both tools depending on my needs.
Share my down(load) function!
1 project | /r/commandline | 22 May 2023

This function is just a simple combination with yt-dlp and percollate.
Selfhosted service to screenshot websites - but I'm not finding the options I need
5 projects | /r/selfhosted | 2 Mar 2023
Reverse Engineering or Recreating the Chrome Extension?
1 project | /r/RemarkableTablet | 21 Jan 2023

If someone hasn't already done this and I can't figure out how they are converting HTML, I have also considered using Percollate to convert, then sending to ReMarkable via rmapi.
ArchiveBox Alternative
3 projects | /r/selfhosted | 7 Sep 2022

The Cli Tool Percollate offers a different approach, but is also very good: https://github.com/danburzo/percollate
Reading web articles on the reMarkable
1 project | /r/RemarkableTablet | 30 Aug 2022
Is there a command line program to convert web pages into readable markdown/htm/pdf format? preferably markdown
3 projects | /r/commandline | 27 Aug 2022

Concerning pdf there is the well known wkhtmltopdf , but let me say that I love the not so well known percollate
CLI to turn web pages into beautiful, readable PDF, ePub, or HTML docs
1 project | news.ycombinator.com | 14 Mar 2022
Show HN: Lurnby, a tool for better learning, is now open source
9 projects | news.ycombinator.com | 11 Feb 2022

Since I'm working on a similar project, this is how I am planning to pull content from the web, utilizing percollate[1] to get the HTML content, I haven't written any implementation for this in Python yet.
If you don't mind me asking, how were you going to implement spaced repetition? Since the Incremental Reading algorithm has never been published as far as I know.
[1]: https://github.com/danburzo/percollate
What Are The Best Linux Apps?
25 projects | /r/linuxquestions | 5 Jan 2022

What are some alternatives?

When comparing gImageReader and percollate you can also consider the following projects:

PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

rdrview - Firefox Reader View as a command line tool

tesseract - Tesseract Open Source OCR Engine (main repository)

koodo-reader - A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web

tesseract-ocr - Tesseract Open Source OCR Engine (main repository)

SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file

docker-teedy - Multi-architecture Dockerfile for Teedy (formerly Sismics Docs)

zimit - Make a ZIM file from any Web site and surf offline!

webapp-manager

monolith-of-web - A chrome extension to make a single static HTML file of the web page using a WebAssembly port of monolith CLI

warpinator - Share files across the LAN

BasicCrawler - Basic web crawler that automates website exploration and producing web resource trees.

gImageReader vs PaddleOCR percollate vs rdrview gImageReader vs tesseract percollate vs koodo-reader gImageReader vs tesseract-ocr percollate vs SingleFile gImageReader vs docker-teedy percollate vs zimit gImageReader vs webapp-manager percollate vs monolith-of-web gImageReader vs warpinator percollate vs BasicCrawler

Compare gImageReader vs percollate and see what are their differences.

gImageReader

percollate

gImageReader

percollate

What are some alternatives?