Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 4 HTML Readability Projects
-
-
ReadabiliPy
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
I have used and love readability.js. I used it in an application that lets you run various NLP analyses over a web page (surprisals, reading time, word counts, etc.). For that, I needed only the main page content. readability.js retrieves main page content well, consistently.
The Alan Turing Institute maintains a Python wrapper around readability.js, too: https://github.com/alan-turing-institute/ReadabiliPy.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Readability4J
A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.
Project mention: Creating an advanced search engine with PostgreSQL | news.ycombinator.com | 2023-07-12Depending upon the type of content, one might want to look into using the Readability (Browder's reader view) to parse the webpage. It will give you all the useful info without the junk. Then you can put it in the DB as needed.
https://github.com/mozilla/readability
Btw, readability, is also available in few other languages like Kotlin:
-
HTML Readability related posts
- Readable: A service for reading long-form content on any device
- A Reader Mode Proxy for the Slow Web, Deployed on shuttle.rs
- Adding tags to torrents
- Show HN: Forlater.email – an email-based bookmarking service
- Show HN: Instantly Listen to Any URL
- How to get the main topic of a Web article?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024
Index
What are some of the best open-source Readability projects in HTML? This list will help you:
Project | Stars | |
---|---|---|
1 | go-readability | 643 |
2 | ReadabiliPy | 179 |
3 | Readability4J | 128 |
4 | readable | 78 |