Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Playwright
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
-
Puts Debuggerer
Ruby library for improved puts debugging, automatically displaying bonus useful information such as source line number and source code.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
subscriptions-digest
Simple project to automate the generation of digest emails for personal subscriptions.
-
feedgen
Generates RSS/ATOM/JSON feeds. Can be reasonably extended or create a feed using the CSS generator.
-
track-changes
Discontinued A JSON HTTP server that tracks other webpages to see if a certain query selector has changed
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Related:
https://github.com/DIYgod/RSSHub
This perhaps has more flexibility and can deal with almost any website.
Ah, you mean the ones inside the contents? That's a good one. I'm not sure if that's easily fixable, but I'll give it some thought. For those interested, I'll track it here: https://gitlab.com/vincenttunru/feed-me-up-scotty/-/issues/1
I'd recommend using the tools I used for this directly if you're looking to do this. Playwright in particular: https://playwright.dev
In case anyone wants to detect the selectors automatically, here's a small python library I wrote that does it for you: https://github.com/lorey/mlscraper
Ah, those instructions are unclear โ as far as I know, you first have to go to https://github.com//feeds/actions to enable Workflows for your repository. Then, your feeds should be published to https://.github.io/feeds/.xml.
Does that work?
Since everyone is pitching their own, I built https://github.com/fran-penedo/rssify, which started as a fork of https://github.com/h43z/rssify. The basic functionality is similar to Vinnl's: give it a URL and some selectors and it builds the RSS feed. From this, I added a few things: templates (if you want to subscribe to individual projects within a webpage, like fanfics in ao3), transforms (when the data is not quite the text of the DOM element), a flask server you can use to add new URLs you have a template for and update the feeds, and a userscript to add the current URL using the server.
Since everyone is pitching their own, I built https://github.com/fran-penedo/rssify, which started as a fork of https://github.com/h43z/rssify. The basic functionality is similar to Vinnl's: give it a URL and some selectors and it builds the RSS feed. From this, I added a few things: templates (if you want to subscribe to individual projects within a webpage, like fanfics in ao3), transforms (when the data is not quite the text of the DOM element), a flask server you can use to add new URLs you have a template for and update the feeds, and a userscript to add the current URL using the server.
This could nicely supplement my GitHub automation that emails feed digests https://github.com/mhitza/subscriptions-digest
Similarly to my repository, I think I would suggest the option to fetch the configuration file from an external resource defined via an action secret. For my automation I'm using a Gist (not sure if Gitlab has same thing; also private but publicly accessible snippets).
At least that way you can keep your own feed configuration while allowing those that fork the repository to not have to manually fix conflicts within the feeds.toml config.
Kinda on a related note I found myself needing to make a bunch of these sorts of scraped feeds. The problem for me was the lack of date parsing support which I sorely needed.
I ended up writing my own CLI tool that similarly supports CSS selectors for feed generation: https://github.com/dayzerosec/feedgen
I did write it specifically for my use-case so there are some "warts" on it like custom generators for HackerOne and Google's Monorail bug tracker. But perhaps someone else might benefit from its ability to create slightly more complicated RSS, Atom, or JSON feeds.
Example config with date parsing: https://github.com/dayzerosec/feedgen/blob/main/configs/bish...
My effort in this space is "furss", though it starts from an rss feed then aims to scrape the full article instead of an extract. https://github.com/jepler/furss
Always good to see RSS projects pop up on hackernews. I'm still maintaining the Feediron plugin for TT-RSS - https://github.com/feediron/ttrss_plugin-feediron
Unlike this project Feediron is only for modifying existing RSS feeds to extract the desired information. Typically uses xpaths to select content
It seems that RSS feed generators are a bit like static site generators: it's often thought to be easier to make your own than to learn to use someone else's.
Anyway, here's another self-hosted open source RSS feed generator for arbitrary websites: https://github.com/hueyy/HungryHippo
Related posts
- RSS-Bridge โ The RSS feed for websites missing it
- The Voxgig Podcast Chatbot: Triggering Ingestion, and some Debugging DX
- LogCaptor: Simplificando o Teste de Logs em APIs REST Java
- Creating an Automated Profile README using Nodejs and GitHub Actions
- Show HN: Twine โ Gorgeous open source multiplatform RSS app