mlscraper
๐ค Scrape data from HTML websites automatically by just providing examples (by lorey)
rssify
script that generates an rss feed out of websites that don't have one (by h43z)
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mlscraper
Posts with mentions or reviews of mlscraper.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-04-12.
-
What are the best tools for web scraping and analysis of natural language to populate a dataset?
See if something like autoscraper or mlscraper suits your needs.
-
Experimental library for scraping websites using OpenAI's GPT API
Why GPT-based then? There are libraries that do this: You give examples, they generate the rules for you and give you a scraper object that takes any html and returns the scraped data.
Mine: https://github.com/lorey/mlscraper
-
Could someone recommend me a library for c# like one of these two (they are for python) : mlscraper and autoscraper
GitHub - lorey/mlscraper: ๐ค Scrape data from HTML websites automatically by just providing examples
-
Smart Scraper
Check it out here: https://github.com/lorey/mlscraper Example: https://github.com/lorey/mlscraper/blob/master/examples/quotes\_to\_scrape.py
- Pre-trained Webscraping Models
- ๐ค Scrape data from HTML websites automatically by just providing examples
- mlscraper: Scrape data from HTML pages automatically with Machine Learning
-
Show HN: RSS feeds for arbitrary websites using CSS selectors
In case anyone wants to detect the selectors automatically, here's a small python library I wrote that does it for you: https://github.com/lorey/mlscraper
rssify
Posts with mentions or reviews of rssify.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-07-05.
-
Show HN: RSS feeds for arbitrary websites using CSS selectors
Since everyone is pitching their own, I built https://github.com/fran-penedo/rssify, which started as a fork of https://github.com/h43z/rssify. The basic functionality is similar to Vinnl's: give it a URL and some selectors and it builds the RSS feed. From this, I added a few things: templates (if you want to subscribe to individual projects within a webpage, like fanfics in ao3), transforms (when the data is not quite the text of the DOM element), a flask server you can use to add new URLs you have a template for and update the feeds, and a userscript to add the current URL using the server.
What are some alternatives?
When comparing mlscraper and rssify you can also consider the following projects:
scrapingant-client-python - ScrapingAnt API client for Python.
RSSHub - ๐งก Everything is RSSible
ttrss_plugin-feediron - Evolution of ttrss_plugin-af_feedmod
furss - Fix Up RSS (and atom): Make full-text versions of rss/atom feeds
feed-me-up-scotty
feedgen - Generates RSS/ATOM/JSON feeds. Can be reasonably extended or create a feed using the CSS generator.
rssify - Tool that generates an rss feed out of websites that don't have one
HungryHippo - ๐ฆ scrapes websites and generates rss feeds