Make Your Own Internet Archive with Archive Box

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ArchiveBox

248 19,737 9.7 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Yeup, just the reason why we expose the USER_AGENT options in ArchiveBox config ;)
https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overv...
I don't want to officially endorse using the Google bot user agent, but you're welcome to try it on your own and see if it improves the experience.

22120

13 2,638 9.7 JavaScript

Discontinued 💾 Diskernet - Your preferred backup solution. It's like you're still online! Full text search archive from your browsing and bookmarks. Weclome! to the Diskernet: an internet on yer disk. Disconnect with Diskernet, an internet for the post-online apocalypse. Or the airplane WiFi. Or the site goes down. Or ... You get the picture. Get Diskernet. 80s logo. Formerly 22120 (project codename) ;P ;) xx;p [Moved to: https://github.com/i5ik/Diskernet]

I'm working in that in my "self host the internet offline from your browsing history" project
https://github.com/c9fe/22120
It makes a web archive from everything your browse, and lately I've been working on the full text search

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ArchiveBox

2 8,085 9.7 Python

Discontinued 🗃 The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more... [Moved to: https://github.com/ArchiveBox/ArchiveBox] (by pirate)

it doesn't show in the Screenshot in the article, but ArchiveBox in Aug 2020 implemented the "readability article text extractor", see description in the release notes: https://github.com/pirate/ArchiveBox/releases/tag/v0.4.14 and the module that does the work https://github.com/pirate/readability-extractor
By only extracting text and article images you could go deep into an archive. If you skip images, much more so

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project