SaaSHub helps you find the best software and product alternatives Learn more →
Top 9 JavaScript Readability Projects
-
Project mention: Show HN: Epublifier – scrape pages (books, manuals) for offline reading | news.ycombinator.com | 2024-10-21
For those interested in a simple to use command line tool that accomplishes the same I've had success with percollate - https://github.com/danburzo/percollate
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: ScrapeGraphAI: Web scraping using LLM and direct graph logic | news.ycombinator.com | 2024-05-07
Agreed!
Apify's Website Content Crawler[0] does a decent job of this for most websites in my experience. It allows you to "extract" content via different built-in methods (e.g. Extractus [1]).
We currently use this at Magic Loops[2] and it works _most_ of the time.
The long-tail is difficult though, and it's not uncommon for users to back out to raw HTML, and then have our tool write some custom logic to parse the content they want from the scraped results (fun fact: before GPT-4 Turbo, the HTML page was often too large for the context window... and sometimes it still is!).
Would love a dedicated tool for this. I know the folks at Reworkd[3] are working on something similar, but not sure how much is public yet.
[0] https://apify.com/apify/website-content-crawler
[1] https://github.com/extractus/article-extractor
[2] https://magicloops.dev/
[3] https://reworkd.ai/
-
-
-
-
-
readability-extractor
Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.
-
-
JavaScript Readability discussion
JavaScript Readability related posts
-
How do Instapaper and Pocket apps extract the content of the articles?
-
Share my down(load) function!
-
Reverse Engineering or Recreating the Chrome Extension?
-
How do I enabled right click menu and developer console on a site that disabled it?
-
software or browser extension to reformat text?
-
Reading web articles on the reMarkable
-
Pa. commission proposes adding and increasing fees, axing gas tax to fund transportation needs.
-
A note from our sponsor - SaaSHub
www.saashub.com | 18 Jan 2025
Index
What are some of the best open-source Readability projects in JavaScript? This list will help you:
# | Project | Stars |
---|---|---|
1 | percollate | 4,350 |
2 | article-extractor | 1,632 |
3 | Just-Read | 1,216 |
4 | apca-w3 | 159 |
5 | stutter | 140 |
6 | retext-readability | 94 |
7 | readability-extractor | 38 |
8 | line-length | 5 |
9 | validate-access | 3 |