Ask HN: How can I back up an old vBulletin forum without admin access?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

grab-site

30 1,260 3.8 Python

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.

detectorist-scraper

1 29 10.0 Python

A scrapy spider to extract post, thread, and user information from a vBulletin forum to a MongoDB database.

> I'm just not sure what the intermediate steps would be to get something usable like a vBulletin…
Once you have a crawl, you'll likely want to convert that unstructured data to structured data. For example, if I look at https://www.vbulletin.org/forum/portal.php, the thread title and hierarchy is in
, posts are in
, etc. I see an old project (https://github.com/IanLondon/detectorist-scraper) that did this and may be a useful place to start, and I imagine there are others.
Once you have the structured data, You can decide whether to use it to build a static site, to import it into another forum, etc.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
warctools

1 136 10.0 Python

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents) (by internetarchive)

You can try https://replayweb.page/ as a test for viewing a WARC file. I do think you'll run into problems though with wanting to browse interconnected links in a forum format, but try this as a first step.
One potential option but definitely a bit more work would be, once you have all the warc files downloaded, you can open them all in python using the warctools module and maybe beautifulsoup and potentially parse/extract all of the data embedded in the WARC archives into your own "fresh" HTML webserver.
https://github.com/internetarchive/warctools

replayweb.page

24 611 7.6 TypeScript

Serverless replay of web archives directly in the browser

You can try https://replayweb.page/ as a test for viewing a WARC file. I do think you'll run into problems though with wanting to browse interconnected links in a forum format, but try this as a first step.
One potential option but definitely a bit more work would be, once you have all the warc files downloaded, you can open them all in python using the warctools module and maybe beautifulsoup and potentially parse/extract all of the data embedded in the WARC archives into your own "fresh" HTML webserver.
https://github.com/internetarchive/warctools

vbulletin

1 7 10.0 Ruby

Discontinued Ruby Gem to fetch data from vBulletin Forums

> This is a gem to help extract data from vBulletin Forums, specifically those which you have no control over.
https://github.com/lloydpick/vbulletin
This is a very old tool, it’s hard to say if it will work; then again, seems very relevant too so worst case it could provide an inspiration.

forum-dl

4 59 9.0 Python

Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC

You can try forum-dl, a forum scraping tool I've been writing for this purpose: https://github.com/mikwielgus/forum-dl
It's single-threaded, alpha-quality software, and still isn't compatible with many forums and themes. But it can export WARCs and may just happen to work for you.

ArchiveBox

248 19,737 9.7 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

I guess your best chance is to use something like https://archivebox.io/.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
warc-server

1 - -

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project