Warctools Alternatives
Similar projects and alternatives to warctools
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
forum-dl
Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC
-
detectorist-scraper
A scrapy spider to extract post, thread, and user information from a vBulletin forum to a MongoDB database.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
warctools reviews and mentions
-
Ask HN: How can I back up an old vBulletin forum without admin access?
You can try https://replayweb.page/ as a test for viewing a WARC file. I do think you'll run into problems though with wanting to browse interconnected links in a forum format, but try this as a first step.
One potential option but definitely a bit more work would be, once you have all the warc files downloaded, you can open them all in python using the warctools module and maybe beautifulsoup and potentially parse/extract all of the data embedded in the WARC archives into your own "fresh" HTML webserver.
https://github.com/internetarchive/warctools
Stats
internetarchive/warctools is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of warctools is Python.
Sponsored