SaaSHub helps you find the best software and product alternatives Learn more →
Top 14 Python rss-feed Projects
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
GNews
A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a JSON response.
Project mention: Need some help with my personal project (interactive world map with real-time data) | /r/datascience | 2023-05-15The web crawling part wasn't much of an issue - I am using an existing API (https://pypi.org/project/gnews/) which does what I needed. The issue lies in, well, pretty much the rest of the task described above. I need to create an interactive world map with real-time data (news articles) - more specifically, maintaining the data server, figuring out the data mapping part, etc. Since I pretty much have no experience in this, I would like to ask you guys for some directions. What tool would I need to use and how would I store/load the data? Is it possible to do so without writing some Javascript code myself?
-
-
Project mention: A set of crappy RSS scripts to handle RSS in an Unix way | news.ycombinator.com | 2024-02-11
-
Project mention: Ask HN: What's the best resource to keep up-to-date on AI developments? | news.ycombinator.com | 2023-04-07
-
Not at the moment, but I'm hesitating. I've already started to Open sourced the suggested sources in the app : https://github.com/Martinviv/rss-sources . You can also ask me for design details if you want ...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
usocial
Read. Listen. Pay back. The podcast client and feed reader for your personal server. With Lightning Network support.
-
-
-
I have myself dabbled a little bit in that subject. Some of my notes:
- some RSS feeds are protected by cloudflare. It is true however that it is not necessary for majority of blogs. If you would like to do more then selenium would be a way to solve "cloudflare" protected links
- sometimes even selenium headless is not enough and full blown browser in selenium is necessary to fool it's protection
- sometimes even that is not enough
- then I started to wonder, why some RSS feeds are so well protected by cloudflare, but who am I to judge?
- sometimes it is beneficial to cover user agent. I feel bad for setting my user agent to chrome, but again, why RSS feeds are so well protected?
- you cannot parse, read entire Internet, therefore you always need to think about compromises. For example I have narrowed area of my searches in one of my projects to domains only. Now I can find most of the common domains, and I sort them by their "importance"
- RSS links do change. There need to be automated means to disable some feeds automatically to prevent checking inactive domains
- I do not see any configurable timeout for reading a page, but I am not familiar with aiohttp. Some pages might waste your time
- I hate that some RSS feeds are not configured properly. Some sites do not provide a valid meta "link" with "application/rss+xml". Some RSS feeds have naive titles like "Home", or no title at all. Such a waste of opportunity
My RSS feed parser, link archiver, web crawler: https://github.com/rumca-js/Django-link-archive. Especially interesting could be file rsshistory/webtools.py. It is not advanced programming craft, but it got the job done.
Additionally, in other project I have collected around 2378 of personal sites. I collect domains in https://github.com/rumca-js/Internet-Places-Database/tree/ma... . These files are JSONs. All personal sites have tag "personal".
Most of the things are collected from:
I wanted also to process domains from https://downloads.marginalia.nu/, but haven't got time to read structure of the files
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python rss-feed related posts
- Is YouTube starting to protect channel RSS feeds?
- A Positive-Only Hacker News RSS Feed
- Streaming Money: Need advice
- Adding Recent Blog Posts to Your GitHub Readme
- reader 2.5 released – a Python feed reader library
- lemon24/reader Reader is a Python feed reader library.
- Show HN: RSS feeds for arbitrary websites using CSS selectors
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Mar 2024
Index
What are some of the best open-source rss-feed projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | trafilatura | 2,607 |
2 | FeedHQ | 568 |
3 | GNews | 512 |
4 | reader | 408 |
5 | rss-tools | 39 |
6 | pwc-feeds | 34 |
7 | rss-sources | 29 |
8 | positive_hackernews | 26 |
9 | usocial | 16 |
10 | hoyolab-rss-feeds | 16 |
11 | Mangalerts | 9 |
12 | Django-link-archive | 8 |
13 | rssify | 6 |
14 | bookmarks-cli | 0 |