mastodon-scraping
Geo-IP-Database
mastodon-scraping | Geo-IP-Database | |
---|---|---|
1 | 1 | |
3 | 8 | |
- | - | |
0.0 | 8.2 | |
4 days ago | 4 days ago | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mastodon-scraping
-
Git scraping: track changes over time by scraping to a Git repository
Thanks for linking to the topic, that was interesting
As a heads up to anyone trying this stunt, please be mindful that git-diff is ultimately a line oriented action (yeah, yeah, "git stores snapshots")
For example https://github.com/pmc-ss/mastodon-scraping/commit/2a15ce1b2... is all :fu: because git sees basically the "first line" changed
However, had the author normalized the instances.json with something like "jq -S" then one would end up with a more reasonable 1736 textual changes, which github would have almost certainly rendered
diff -u \
Geo-IP-Database
-
Git scraping: track changes over time by scraping to a Git repository
I have a couple of similar scrapers as well. One is a private repo that I collect visa information off Wikipedia (for Visalogy.com), and GeoIP information from MaxMind database (used with their permission).
https://github.com/Ayesh/Geo-IP-Database/
It downloads the repo, and dumps the data split by the first 8 bytes of the IP address, and saves to individual JSON files. For every new scraper run, it creates a new tag and pushes it as a package, so the dependents can simply update them with their dependency manager.
What are some alternatives?
gesetze-im-internet - Archive of German legal acts (weekly archive of gesetze-im-internet.de)
scrape-san-mateo-fire-dispatch
github-actions - Infromation and tips regarding GitHub Actions
mcbroken-archive - :inbox_tray: Archive for data from mcbroken.com.
bbcrss - Scrapes the headlines from BBC News indexes every five minutes
metrobus-timetrack-history - Tracking Metrobus location data
hun_law_py - Tools for parsing hungarian legal documents
carbon-intensity-forecast-tracking - The reliability of the National Grid's Carbon Intensity forecast
bchydro-outages - Track BCHydro Outages via Git history