With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js. Learn more →
Top 23 Readability Open-Source Projects
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
textstat
:memo: python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
-
code-review-checklist
This code review checklist helps you be a more effective and efficient code reviewer.
-
SAPC-APCA
APCA (Accessible Perceptual Contrast Algorithm) is a new method for predicting contrast for use in emerging web standards (WCAG 3) for determining readability contrast. APCA is derived form the SAPC (S-LUV Advanced Predictive Color) which is an accessibility-oriented color appearance model designed for self-illuminated displays.
-
Vyxal
A code-golfing language experience that has aspects of traditional programming languages - terse, elegant, readable.
-
crux
Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages. (by chimbori)
-
reader
reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI. (by mrusme)
-
DistiLlama
Chrome Extension to Summarize or Chat with Web Pages/Local Documents Using locally running LLMs. Keep all of your data and conversations private. 🔐
-
ReadabiliPy
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
-
kindleServer
This project serve HTML files (and a few more) saved in your computer with a UI suitable for Kindle web browser. On top of that, it include a Read Mode (thanks to ReadabiliPy) to display the text in a comfortable size without have to use the 'Article Mode' in Kindle web browser.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: The Case Against AI Everything, Everywhere, All at Once | news.ycombinator.com | 2023-10-19You can still choose automation. The easier route for me is to use wallabag to save the article. Then on my remarkable tablet I can grab a very readable document with https://github.com/koreader/koreader.
The other option is to use https://github.com/danburzo/percollate to convert a webpage to a nice document directly. I use both tools depending on my needs.
Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
Project mention: How do Instapaper and Pocket apps extract the content of the articles? | /r/opensource | 2023-12-04Edit: I found this library in NodeJs useful for article extraction. Anyone looking for something like you can take a look. https://github.com/extractus/article-extractor
You can also do this with the Stylebot extension - https://github.com/ankit/stylebot .. I was already doing this :) (just copy paste the CSS)
no a filter for uBO (I do not think it is possible) but I really like this extension: https://github.com/lindylearn/unclutter
It's a bit of a hit and miss as of today. CLI, back-end and natively compiled libraries (think dll/so/dylib or even .lib/.a - you can statically link NAOT binaries into other "unmanaged" code) work best, GUI - requires more work.
Avalonia[0] and MAUI[1] have known working templates with it, but YMMV.
[0] https://github.com/lixinyang123/AvaloniaAOT / https://github.com/AvaloniaUI/Avalonia/ / honorable mention https://github.com/VincentH-Net/CSharpForMarkup
[1] https://github.com/dotnet/maui (try out with just true in csproj - it is known to work e.g. on iOS)
Project mention: Are there any CLIs or ways to save articles as markdown documents and search them that you would recommend? | /r/commandline | 2023-05-18this tool just reads a URL, trims all the fat (reader view like in Firefox) and then can output either to your screen or to MD using the -o switch
Project mention: DistiLlama: Chrome Extension to Summarize Web Pages Using locally running LLMs | /r/LocalLLaMA | 2023-10-28https://github.com/shreyaskarnik/DistiLlama feedback/suggestions and PRs are welcome.
I have used and love readability.js. I used it in an application that lets you run various NLP analyses over a web page (surprisals, reading time, word counts, etc.). For that, I needed only the main page content. readability.js retrieves main page content well, consistently.
The Alan Turing Institute maintains a Python wrapper around readability.js, too: https://github.com/alan-turing-institute/ReadabiliPy.
Readability related posts
- Txtdot: HTTP proxy that only parses text, links and pictures from pages
- Vyxal: A code-golfing language experience
- I know nothing, but I gotta learn
- How do Instapaper and Pocket apps extract the content of the articles?
- Share my down(load) function!
- Powerful and free scraper with a headless browser under the hood and Readability for parsing
- Code Reviews ebook
-
A note from our sponsor - SurveyJS
surveyjs.io | 23 Apr 2024
Index
What are some of the best open-source Readability projects? This list will help you:
Project | Stars | |
---|---|---|
1 | web-clipper | 5,753 |
2 | percollate | 4,108 |
3 | trafilatura | 2,740 |
4 | article-extractor | 1,375 |
5 | stylebot | 1,344 |
6 | unclutter | 1,199 |
7 | Just-Read | 1,170 |
8 | textstat | 1,074 |
9 | code-review-checklist | 831 |
10 | CSharpForMarkup | 707 |
11 | go-readability | 647 |
12 | Midnight-Lizard | 615 |
13 | SAPC-APCA | 398 |
14 | scrape | 326 |
15 | Vyxal | 261 |
16 | crux | 236 |
17 | readability | 228 |
18 | reader | 217 |
19 | Cadmium | 201 |
20 | DistiLlama | 201 |
21 | ReadabiliPy | 180 |
22 | mercury_fulltext | 155 |
23 | kindleServer | 154 |
Sponsored