mlscraper
๐ค Scrape data from HTML websites automatically by just providing examples (by lorey)
rssify
Tool that generates an rss feed out of websites that don't have one (by fran-penedo)
mlscraper | rssify | |
---|---|---|
10 | 1 | |
1,229 | 8 | |
- | - | |
0.6 | 0.0 | |
about 2 months ago | over 2 years ago | |
Python | Python | |
- | GNU General Public License v3.0 only |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mlscraper
Posts with mentions or reviews of mlscraper.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-04-12.
-
What are the best tools for web scraping and analysis of natural language to populate a dataset?
See if something like autoscraper or mlscraper suits your needs.
-
Experimental library for scraping websites using OpenAI's GPT API
Why GPT-based then? There are libraries that do this: You give examples, they generate the rules for you and give you a scraper object that takes any html and returns the scraped data.
Mine: https://github.com/lorey/mlscraper
-
Could someone recommend me a library for c# like one of these two (they are for python) : mlscraper and autoscraper
GitHub - lorey/mlscraper: ๐ค Scrape data from HTML websites automatically by just providing examples
-
Smart Scraper
Check it out here: https://github.com/lorey/mlscraper Example: https://github.com/lorey/mlscraper/blob/master/examples/quotes\_to\_scrape.py
- Pre-trained Webscraping Models
- ๐ค Scrape data from HTML websites automatically by just providing examples
- mlscraper: Scrape data from HTML pages automatically with Machine Learning
-
Show HN: RSS feeds for arbitrary websites using CSS selectors
In case anyone wants to detect the selectors automatically, here's a small python library I wrote that does it for you: https://github.com/lorey/mlscraper
rssify
Posts with mentions or reviews of rssify.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-07-05.
-
Show HN: RSS feeds for arbitrary websites using CSS selectors
Since everyone is pitching their own, I built https://github.com/fran-penedo/rssify, which started as a fork of https://github.com/h43z/rssify. The basic functionality is similar to Vinnl's: give it a URL and some selectors and it builds the RSS feed. From this, I added a few things: templates (if you want to subscribe to individual projects within a webpage, like fanfics in ao3), transforms (when the data is not quite the text of the DOM element), a flask server you can use to add new URLs you have a template for and update the feeds, and a userscript to add the current URL using the server.
What are some alternatives?
When comparing mlscraper and rssify you can also consider the following projects:
scrapingant-client-python - ScrapingAnt API client for Python.
RSSHub - ๐งก Everything is RSSible
ttrss_plugin-feediron - Evolution of ttrss_plugin-af_feedmod
feed-me-up-scotty
furss - Fix Up RSS (and atom): Make full-text versions of rss/atom feeds
feedgen - Generates RSS/ATOM/JSON feeds. Can be reasonably extended or create a feed using the CSS generator.
rssify - script that generates an rss feed out of websites that don't have one
subscriptions-digest - Simple project to automate the generation of digest emails for personal subscriptions.
mlscraper vs scrapingant-client-python
rssify vs RSSHub
mlscraper vs ttrss_plugin-feediron
rssify vs feed-me-up-scotty
mlscraper vs furss
rssify vs feedgen
mlscraper vs feed-me-up-scotty
rssify vs ttrss_plugin-feediron
mlscraper vs RSSHub
rssify vs rssify
mlscraper vs feedgen
rssify vs subscriptions-digest