github-actions
github-actions | scrape-san-mateo-fire-dispatch | |
---|---|---|
1 | 1 | |
6 | 2 | |
- | - | |
10.0 | 0.0 | |
almost 3 years ago | 4 months ago | |
Markdown | Python | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
github-actions
-
Git scraping: track changes over time by scraping to a Git repository
They have the right icon, clickable username and it is as simple as just using this email and name. You or someone else might like to do this, too, so here's me sharing this neat trick I found.
https://github.com/TomasHubelbauer/github-actions#write-work...
scrape-san-mateo-fire-dispatch
-
Git scraping: track changes over time by scraping to a Git repository
Git is a key technology in this approach, because the value you get out of this form of scraping is the commit history - it's a way of turning a static source of information into a record of how that information changed over time.
I think it's fine to use the term "scraping" to refer to downloading a JSON file.
These days an increasing number of websites work by serving up JSON which is then turned into HTML by a client-side JavaScript app. The JSON often isn't a formally documented API, but you can grab it directly to avoid the extra step of processing the HTML.
I do run Git scrapers that process HTML as well. A couple of examples:
scrape-san-mateo-fire-dispatch https://github.com/simonw/scrape-san-mateo-fire-dispatch scrapes the HTML from http://www.firedispatch.com/iPhoneActiveIncident.asp?Agency=... and records both the original HTML and converted JSON in the repository.
scrape-hacker-news-by-domain https://github.com/simonw/scrape-hacker-news-by-domain uses my https://shot-scraper.datasette.io/ browser automation tool to convert an HTML page on Hacker News into JSON and save that to the repo. I wrote more about how that works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/
What are some alternatives?
torvenyek - Magyar törvények git repo
shot-scraper - A command-line utility for taking automated screenshots of websites
hun_law_py - Tools for parsing hungarian legal documents
Geo-IP-Database - Automatically updated tree-formatted database from MaxMind database
mastodon-scraping - Repository for scraping public information from Mastodon
carbon-intensity-forecast-tracking - The reliability of the National Grid's Carbon Intensity forecast
gesetze-im-internet - Archive of German legal acts (weekly archive of gesetze-im-internet.de)
scrape-hacker-news-by-domain - Scrape HN to track links from specific domains
bchydro-outages - Track BCHydro Outages via Git history
queensland-traffic-conditions - A scraper that tracks changes to the published queensland traffic incidents data
metrobus-timetrack-history - Tracking Metrobus location data