Make Your Own Internet Archive with Archive Box

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • Yeup, just the reason why we expose the USER_AGENT options in ArchiveBox config ;)

    https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overv...

    I don't want to officially endorse using the Google bot user agent, but you're welcome to try it on your own and see if it improves the experience.

  • 22120

    Discontinued 💾 Diskernet - Your preferred backup solution. It's like you're still online! Full text search archive from your browsing and bookmarks. Weclome! to the Diskernet: an internet on yer disk. Disconnect with Diskernet, an internet for the post-online apocalypse. Or the airplane WiFi. Or the site goes down. Or ... You get the picture. Get Diskernet. 80s logo. Formerly 22120 (project codename) ;P ;) xx;p [Moved to: https://github.com/i5ik/Diskernet]

  • I'm working in that in my "self host the internet offline from your browsing history" project

    https://github.com/c9fe/22120

    It makes a web archive from everything your browse, and lately I've been working on the full text search

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ArchiveBox

    Discontinued 🗃 The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more... [Moved to: https://github.com/ArchiveBox/ArchiveBox] (by pirate)

  • it doesn't show in the Screenshot in the article, but ArchiveBox in Aug 2020 implemented the "readability article text extractor", see description in the release notes: https://github.com/pirate/ArchiveBox/releases/tag/v0.4.14 and the module that does the work https://github.com/pirate/readability-extractor

    By only extracting text and article images you could go deep into an archive. If you skip images, much more so

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts