mlscraper
subscriptions-digest
mlscraper | subscriptions-digest | |
---|---|---|
10 | 2 | |
1,229 | 3 | |
- | - | |
0.6 | 1.7 | |
about 2 months ago | about 1 year ago | |
Python | PHP | |
- | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mlscraper
-
What are the best tools for web scraping and analysis of natural language to populate a dataset?
See if something like autoscraper or mlscraper suits your needs.
-
Experimental library for scraping websites using OpenAI's GPT API
Why GPT-based then? There are libraries that do this: You give examples, they generate the rules for you and give you a scraper object that takes any html and returns the scraped data.
Mine: https://github.com/lorey/mlscraper
-
Could someone recommend me a library for c# like one of these two (they are for python) : mlscraper and autoscraper
GitHub - lorey/mlscraper: ๐ค Scrape data from HTML websites automatically by just providing examples
-
Smart Scraper
Check it out here: https://github.com/lorey/mlscraper Example: https://github.com/lorey/mlscraper/blob/master/examples/quotes\_to\_scrape.py
- Pre-trained Webscraping Models
- ๐ค Scrape data from HTML websites automatically by just providing examples
- mlscraper: Scrape data from HTML pages automatically with Machine Learning
-
Show HN: RSS feeds for arbitrary websites using CSS selectors
In case anyone wants to detect the selectors automatically, here's a small python library I wrote that does it for you: https://github.com/lorey/mlscraper
subscriptions-digest
-
Show HN: RSS feeds for arbitrary websites using CSS selectors
This could nicely supplement my GitHub automation that emails feed digests https://github.com/mhitza/subscriptions-digest
Similarly to my repository, I think I would suggest the option to fetch the configuration file from an external resource defined via an action secret. For my automation I'm using a Gist (not sure if Gitlab has same thing; also private but publicly accessible snippets).
At least that way you can keep your own feed configuration while allowing those that fork the repository to not have to manually fix conflicts within the feeds.toml config.
-
Show HN: Repo automation that generates a daily digest email of your feeds
Being displeased with feed reader experience I wrote up a quick script and GitHub automation that generates a daily digest email for my subscriptions.
I have been using this setup for a week at this point, and I found it pleasant enough that I thought others might find a use for it as well.
https://github.com/mhitza/subscriptions-digest
What are some alternatives?
scrapingant-client-python - ScrapingAnt API client for Python.
RSSHub - ๐งก Everything is RSSible
ttrss_plugin-feediron - Evolution of ttrss_plugin-af_feedmod
feed-me-up-scotty
furss - Fix Up RSS (and atom): Make full-text versions of rss/atom feeds
rssify - script that generates an rss feed out of websites that don't have one
rssify - Tool that generates an rss feed out of websites that don't have one
codsletter - Codsletter: turn your website into a periodical newsletter, all automatically!