github-actions
scrape-san-mateo-fire-dispatch | github-actions | |
---|---|---|
1 | 1 | |
2 | 6 | |
- | - | |
0.0 | 10.0 | |
8 months ago | about 3 years ago | |
Python | Markdown | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scrape-san-mateo-fire-dispatch
-
Git scraping: track changes over time by scraping to a Git repository
Git is a key technology in this approach, because the value you get out of this form of scraping is the commit history - it's a way of turning a static source of information into a record of how that information changed over time.
I think it's fine to use the term "scraping" to refer to downloading a JSON file.
These days an increasing number of websites work by serving up JSON which is then turned into HTML by a client-side JavaScript app. The JSON often isn't a formally documented API, but you can grab it directly to avoid the extra step of processing the HTML.
I do run Git scrapers that process HTML as well. A couple of examples:
scrape-san-mateo-fire-dispatch https://github.com/simonw/scrape-san-mateo-fire-dispatch scrapes the HTML from http://www.firedispatch.com/iPhoneActiveIncident.asp?Agency=... and records both the original HTML and converted JSON in the repository.
scrape-hacker-news-by-domain https://github.com/simonw/scrape-hacker-news-by-domain uses my https://shot-scraper.datasette.io/ browser automation tool to convert an HTML page on Hacker News into JSON and save that to the repo. I wrote more about how that works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/
github-actions
-
Git scraping: track changes over time by scraping to a Git repository
They have the right icon, clickable username and it is as simple as just using this email and name. You or someone else might like to do this, too, so here's me sharing this neat trick I found.
https://github.com/TomasHubelbauer/github-actions#write-work...
What are some alternatives?
shot-scraper - A command-line utility for taking automated screenshots of websites
mastodon-scraping - Repository for scraping public information from Mastodon
Geo-IP-Database - Automatically updated tree-formatted database from MaxMind database
gesetze-im-internet - Archive of German legal acts (weekly archive of gesetze-im-internet.de)
queensland-traffic-conditions - A scraper that tracks changes to the published queensland traffic incidents data
hun_law_py - Tools for parsing hungarian legal documents