dom-distiller
soup-strainer
Our great sponsors
dom-distiller | soup-strainer | |
---|---|---|
3 | 1 | |
594 | 33 | |
- | - | |
0.0 | 10.0 | |
over 2 years ago | about 10 years ago | |
Java | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dom-distiller
- How does Firefox's Reader View work?
- The most underused browser feature
-
An app like Pocket to read articles and highlight?
The one ask you have that Literal doesn't yet support is read mode for sources (though it will automatically archive / backup sources). It looks like Chrome's read mode (i.e. the "Show simplified view" toolbar) is open source, so I think I could add support relatively quickly if you're interested.
soup-strainer
-
How does Firefox's Reader View work?
I implemented a variation of the Readability algorithm some 9 years ago, in case anyone needs a server-side Python version and is interested in dragging it (kicking and screaming) into the 2020s:
https://github.com/rcarmo/soup-strainer
What are some alternatives?
readability - Readability is a library written in Go (golang) to parse, analyze and convert HTML pages into readable content. Originally an Arc90 Experiment, it is now incorporated into Safari’s Reader View.
unclutter - A modern reader mode and article library for your browser.
ftr-site-config - Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
trafilatura - Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
parser - 📜 Extract meaningful content from the chaos of a web page
Readability4J - A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.
go-trafilatura - go-trafilatura is a Go port of the trafilatura Python library.
einkbro - A small, fast web browser based on Android WebView. It's tailored for E-Ink devices but also works great on normal android devices.
go-domdistiller - Go-DomDistiller is a Go port of the DOM Distiller library which implements Reader mode in Chrome for Android and Desktop. It has no dependencies on Chromium and is meant to run as a command line program or on a server.
readability - A standalone version of the readability lib
go-htmldate - CLI and Go package for extracting publication date of a web pages.