Top 6 Python news-aggregator Projects
-
newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
Project mention: What's the fun in writing on the internet anymore? | news.ycombinator.com | 2024-02-17https://hackernews.betacat.io/ here they use ChatGTP so summarize HN frontpage stories, and it says "Article discusses automated plagiarism and the diminishing value of authorship online. It compares today's internet to ancient texts, where authorship was less defined."
Project mention: Release 11.0 of Newspipe with a new dark theme – a news reader | news.ycombinator.com | 2024-02-27
Python news-aggregator related posts
Index
What are some of the best open-source news-aggregator projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | newspaper | 13,720 |
2 | trafilatura | 2,778 |
3 | hacker-news-digest | 645 |
4 | newspipe | 406 |
5 | JARR | 117 |
6 | cabbage_news | 2 |
Sponsored