Our great sponsors
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
ftr-site-config
Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Wallabag
wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
-
einkbro
A small, fast web browser based on Android WebView. It's tailored for E-Ink devices but also works great on normal android devices.
-
readability
Readability is a library written in Go (golang) to parse, analyze and convert HTML pages into readable content. Originally an Arc90 Experiment, it is now incorporated into Safari’s Reader View. (by cixtor)
-
rssfeed
Web application written in Go to curate articles from multiple RSS feeds like HackerNews, Reddit, etc. It will significantly improve your reading experience while using your favorite RSS client.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Any developers who'd like to contribute to improving how article content is extracted from web pages should check out Mozilla's Readability repository: https://github.com/mozilla/readability
I'm currently trying to bring the PHP port up to speed here: https://github.com/fivefilters/readability.php
We use currently use an older version as part of our article extraction for Push to Kindle: https://www.fivefilters.org/push-to-kindle/
How much do you care? There's an open pull request you could give a little time to that would fix it by integrating a reader mode into Tridactyl here: https://github.com/tridactyl/tridactyl/pull/3306
: )
Any developers who'd like to contribute to improving how article content is extracted from web pages should check out Mozilla's Readability repository: https://github.com/mozilla/readability
I'm currently trying to bring the PHP port up to speed here: https://github.com/fivefilters/readability.php
We use currently use an older version as part of our article extraction for Push to Kindle: https://www.fivefilters.org/push-to-kindle/
Thanks for mentioning Instant View, I hadn't come across that. We actually maintain something similar here: https://github.com/fivefilters/ftr-site-config
We use these in our own tools and also get contributions from others, including Wallabag users: https://github.com/wallabag/wallabag
Before it was sold, Instapaper used to have something similar. A public database of its site-specific extraction templates. We used that as the starting point for our repository.
Thanks for mentioning Instant View, I hadn't come across that. We actually maintain something similar here: https://github.com/fivefilters/ftr-site-config
We use these in our own tools and also get contributions from others, including Wallabag users: https://github.com/wallabag/wallabag
Before it was sold, Instapaper used to have something similar. A public database of its site-specific extraction templates. We used that as the starting point for our repository.
I'm a huge fan of Readability Mode and use it often. It's proof that Web design isn't the solution, Web design is the problem.
For those who are using e-ink devices, or even just standard tablets, EInkBro is another immensely useful tool. Yes, it's a standalone browser, not a mode on Firefox, Safari, Vivalti, etc.
https://github.com/plateaukao/browser
(Available through Google Play, F-Droid and other sources. Android-only, sorry iOS fans.)
What it offers over standard browsers is that it's optimised for e-ink displays. That is, it favours pagination over scrolling, runs to full-screen, can easily adjust font size up or down (no more itsy-bitsy-teen-weenie-yellow-polka-dot HN fonts), bold text, and has its own reader mode as well.
Even on a standard tablet, some of these features are a huge step above and beyond the mainstream browsers.
The feature-set is limited, some of the UI is a bit rough, and a few things are just plain broken (if you need to edit entries in the JS or Cookie enabled/disabled sites ... you have to delete all data and start over again).
That said, my usage is evolving from sending individual pages to EInkBrow when I want to do long-form reading, to using it at least part-time as a primary browser. (Mozilla Fennec Fox is my first choice, still.) The browser is stable and very much usable despite this. The developer is responsive to requests and bug reports.
What's most refreshing is that the design principle is readability of Web content, as determined by the user, and not by the page author or publisher.
Another robust solution is Tranquility reader which exists as an extension and has better accuracy than Readability at the expense of speed.
https://github.com/ushnisha/tranquility-reader-webextensions
I ported Mozilla’s Readability library to Go a couple of years ago [1] and use it every day to power a custom RSS feed of Hacker News via Reeder [2]. This is not a novelty, many people have ported Readability to different programming languages over the years.
[1] https://github.com/cixtor/readability
[2] https://github.com/cixtor/rssfeed
I ported Mozilla’s Readability library to Go a couple of years ago [1] and use it every day to power a custom RSS feed of Hacker News via Reeder [2]. This is not a novelty, many people have ported Readability to different programming languages over the years.
[1] https://github.com/cixtor/readability
[2] https://github.com/cixtor/rssfeed
https://hn.algolia.com/?dateRange=all&page=2&prefix=true&que...
Six years of nopaste exhortations
Depending on how far you want to go, there are VNC clients[1], toltec has opkg-installable stuff including at least one browser[2] known to work, and there are full OS replacements that let you run a full linux GUI[3] which can almost certainly run a normal-ish desktop browser.
So while this one won't work, there are options.
[1] https://github.com/reHackable/awesome-reMarkable
Another cool crowdsourced thing I discovered recently is SponsorBlock [1] which is an extension to automatically skip sponsored content in Youtube videos. Users contribute timings to the database that everyone else uses. It works remarkably well, any recent video with more than about 50,000 views is pretty much guaranteed to have timings submitted.
[1] https://sponsor.ajay.app/
Prune instructs the parser to remove any elements within the extracted article block that look superfluous. This can result in false positives, so we tend to disable it when we've gone to the trouble of creating site-specific extraction rules.
Tidy determines if the source HTML should be cleaned up first with HTML Tidy - https://github.com/htacg/tidy-html5. If you're parsing the source HTML with an HTML 5 parser, as we are now, it shouldn't be necessary any more (I think we actually ignore it now). We used it more before when we relied on libxml parsing, which often trips up on modern HTML.
> (The end user should apply their own CSS if wanted;
I still remember the time when user styles were a first class feature built into all browsers. Hell, that 'C' in CSS, "Cascading", was always there to allow styles to enhance/override prior styles, including - at the top-level - to allow the user to override website's styling. Back before web designers ruined everything by making every web page into its own special snowflake, people thought users would have one-two default CSS sheets to choose from and apply to any webpage, the way we today think about "dark mode".
These days, we have to resort to using browser extensions. A well-known one is Stylus [0][1]. In a way, it's much better than old built-in user styles. But then, it's not built in.
--
[0] - https://github.com/openstyles/stylus
[1] - Mind the name, it's "Stylus", not "Stylish" - the latter used to be popular, but then it sold out and become another peace of surveillance capitalism detritus. Stylus is a GPLv3 fork of Stylish with data collection removed.
It uses a pretty simple text selection algorithm I've developed through trial and error: https://github.com/ZachSaucier/Just-Read/blob/6dcb4f05b93287...
I don't know how it compares to Readability.js.
Related posts
- Elon Musk Fans Horrified When His Grok AI Immediately "Goes Woke"
- 9 years ago my crew found the oldest time capsule in US history buried in a cornerstone of the Mass. State House.
- me irl
- Oh great, they redesigned reddit so it has a smaller display font.
- Weekly Random Discussion Thread for 12/4/23 - 12/10/23