Readability

Top 23 Readability Open-Source Projects

  • web-clipper

    For Notion,OneNote,Bear,Yuque,Joplin。Clip anything to anywhere

  • percollate

    A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.

  • Project mention: The Case Against AI Everything, Everywhere, All at Once | news.ycombinator.com | 2023-10-19

    You can still choose automation. The easier route for me is to use wallabag to save the article. Then on my remarkable tablet I can grab a very readable document with https://github.com/koreader/koreader.

    The other option is to use https://github.com/danburzo/percollate to convert a webpage to a nice document directly. I use both tools depending on my needs.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • trafilatura

    Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

  • Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14

    The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features

    Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.

  • article-extractor

    To extract main article from given URL with Node.js

  • Project mention: How do Instapaper and Pocket apps extract the content of the articles? | /r/opensource | 2023-12-04

    Edit: I found this library in NodeJs useful for article extraction. Anyone looking for something like you can take a look. https://github.com/extractus/article-extractor

  • stylebot

    Change the appearance of the web instantly

  • Project mention: Dracula Theme for Hacker News | news.ycombinator.com | 2024-01-29

    You can also do this with the Stylebot extension - https://github.com/ankit/stylebot .. I was already doing this :) (just copy paste the CSS)

  • unclutter

    A modern reader mode and article library for your browser.

  • Project mention: Reader View / Links2 like web view filter | /r/uBlockOrigin | 2023-06-14

    no a filter for uBO (I do not think it is possible) but I really like this extension: https://github.com/lindylearn/unclutter

  • Just-Read

    A customizable read mode web extension.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • textstat

    :memo: python package to calculate readability statistics of a text object - paragraphs, sentences, articles.

  • code-review-checklist

    This code review checklist helps you be a more effective and efficient code reviewer.

  • CSharpForMarkup

    Concise, declarative C# UI markup for .NET browser / native UI frameworks

  • Project mention: .NET 8 – .NET Blog | news.ycombinator.com | 2023-11-14

    It's a bit of a hit and miss as of today. CLI, back-end and natively compiled libraries (think dll/so/dylib or even .lib/.a - you can statically link NAOT binaries into other "unmanaged" code) work best, GUI - requires more work.

    Avalonia[0] and MAUI[1] have known working templates with it, but YMMV.

    [0] https://github.com/lixinyang123/AvaloniaAOT / https://github.com/AvaloniaUI/Avalonia/ / honorable mention https://github.com/VincentH-Net/CSharpForMarkup

    [1] https://github.com/dotnet/maui (try out with just true in csproj - it is known to work e.g. on iOS)

  • go-readability

    Go package that cleans a HTML page for better readability.

  • Midnight-Lizard

    Сustom color schemes for all websites

  • SAPC-APCA

    APCA (Accessible Perceptual Contrast Algorithm) is a new method for predicting contrast for use in emerging web standards (WCAG 3) for determining readability contrast. APCA is derived form the SAPC (S-LUV Advanced Predictive Color) which is an accessibility-oriented color appearance model designed for self-illuminated displays.

  • scrape

    Scrape any website, article or RSS/Atom Feed with ease!

  • Vyxal

    A code-golfing language experience that has aspects of traditional programming languages - terse, elegant, readable.

  • Project mention: Vyxal: A code-golfing language experience | news.ycombinator.com | 2024-02-28
  • crux

    Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages. (by chimbori)

  • readability

    Readability is Elixir library for extracting and curating articles. (by keepcosmos)

  • reader

    reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI. (by mrusme)

  • Project mention: Are there any CLIs or ways to save articles as markdown documents and search them that you would recommend? | /r/commandline | 2023-05-18

    this tool just reads a URL, trims all the fat (reader view like in Firefox) and then can output either to your screen or to MD using the -o switch

  • Cadmium

    Natural Language Processing (NLP) library for Crystal

  • DistiLlama

    Chrome Extension to Summarize or Chat with Web Pages/Local Documents Using locally running LLMs. Keep all of your data and conversations private. 🔐

  • Project mention: DistiLlama: Chrome Extension to Summarize Web Pages Using locally running LLMs | /r/LocalLLaMA | 2023-10-28

    https://github.com/shreyaskarnik/DistiLlama feedback/suggestions and PRs are welcome.

  • ReadabiliPy

    A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.

  • Project mention: Mozilla: Readability.js | news.ycombinator.com | 2024-02-25

    I have used and love readability.js. I used it in an application that lets you run various NLP analyses over a web page (surprisals, reading time, word counts, etc.). For that, I needed only the main page content. readability.js retrieves main page content well, consistently.

    The Alan Turing Institute maintains a Python wrapper around readability.js, too: https://github.com/alan-turing-institute/ReadabiliPy.

  • mercury_fulltext

    📖 Enjoy full text for tt-rss.

  • kindleServer

    This project serve HTML files (and a few more) saved in your computer with a UI suitable for Kindle web browser. On top of that, it include a Read Mode (thanks to ReadabiliPy) to display the text in a comfortable size without have to use the 'Article Mode' in Kindle web browser.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Readability related posts

Index

What are some of the best open-source Readability projects? This list will help you:

Project Stars
1 web-clipper 5,753
2 percollate 4,108
3 trafilatura 2,740
4 article-extractor 1,375
5 stylebot 1,344
6 unclutter 1,199
7 Just-Read 1,170
8 textstat 1,074
9 code-review-checklist 831
10 CSharpForMarkup 707
11 go-readability 647
12 Midnight-Lizard 615
13 SAPC-APCA 398
14 scrape 326
15 Vyxal 261
16 crux 236
17 readability 228
18 reader 217
19 Cadmium 201
20 DistiLlama 201
21 ReadabiliPy 180
22 mercury_fulltext 155
23 kindleServer 154

Sponsored
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com