masscan_as_a_service
masscan_as_a_service | scrape-san-mateo-fire-dispatch | |
---|---|---|
3 | 1 | |
22 | 1 | |
- | - | |
0.0 | 0.0 | |
over 1 year ago | 3 days ago | |
Python | HTML | |
GNU General Public License v3.0 or later | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
masscan_as_a_service
-
Git scraping: track changes over time by scraping to a Git repository
I use this approach for monitoring open ports in our infrastructure -- running masscan, commiting results to git repo. If there are changes, open the merge request for review. During the review, one would investigate the actual server, why there was change in open ports.
https://github.com/bobek/masscan_as_a_service
-
Self-Host Vulnerability Scanner
We typically use a variant of https://github.com/bobek/masscan_as_a_service
-
Masscan: Scan the entire Internet in under 5 minutes
Massacan is awesome. One of the usecases is to periodically scan your own servers to see if you have not accidentally opened some new ports in firewalls.
https://github.com/bobek/masscan_as_a_service
scrape-san-mateo-fire-dispatch
-
Git scraping: track changes over time by scraping to a Git repository
Git is a key technology in this approach, because the value you get out of this form of scraping is the commit history - it's a way of turning a static source of information into a record of how that information changed over time.
I think it's fine to use the term "scraping" to refer to downloading a JSON file.
These days an increasing number of websites work by serving up JSON which is then turned into HTML by a client-side JavaScript app. The JSON often isn't a formally documented API, but you can grab it directly to avoid the extra step of processing the HTML.
I do run Git scrapers that process HTML as well. A couple of examples:
scrape-san-mateo-fire-dispatch https://github.com/simonw/scrape-san-mateo-fire-dispatch scrapes the HTML from http://www.firedispatch.com/iPhoneActiveIncident.asp?Agency=... and records both the original HTML and converted JSON in the repository.
scrape-hacker-news-by-domain https://github.com/simonw/scrape-hacker-news-by-domain uses my https://shot-scraper.datasette.io/ browser automation tool to convert an HTML page on Hacker News into JSON and save that to the repo. I wrote more about how that works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/
What are some alternatives?
zmap - ZMap is a fast single packet network scanner designed for Internet-wide network surveys.
shot-scraper - A command-line utility for taking automated screenshots of websites
zdns - Fast CLI DNS Lookup Tool
scrape-hacker-news-by-domain - Scrape HN to track links from specific domains
github-actions - Infromation and tips regarding GitHub Actions
carbon-intensity-forecast-tracking - The reliability of the National Grid's Carbon Intensity forecast
bbcrss - Scrapes the headlines from BBC News indexes every five minutes
xssmap - Intelligent XSS detection tool that uses human techniques for looking for reflected cross-site scripting (XSS) vulnerabilities
metrobus-timetrack-history - Tracking Metrobus location data
masscan - TCP port scanner, spews SYN packets asynchronously, scanning entire Internet in under 5 minutes.
Geo-IP-Database - Automatically updated tree-formatted database from MaxMind database