Our great sponsors
-
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The documentation source is available (e.g. https://github.com/MicrosoftDocs/win32) but I can't figure out how to build it. There's hints of docfx being used but I can't get it to build without errors due to the markdown being too deeply nested.
Crawling is of course the other option. I've seen https://github.com/ArchiveTeam/grab-site in the wiki, but I'm unsure how to host the resulting .warc archives.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
- struggling to download websites
- Internet Archive Down, will be up and running soon (i hope).
- best tool for downloading forum posts in real-time?
- grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
- Data hoarders, start backing up government websites and news articles as well