Our great sponsors
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
DownloadNet
💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!
-
markdownload
A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.
I love all these kind of projects as I tend to be paranoid of losing good online content.
It’s also unclear to me how wWayback works. It seems more like an API than a self-hosted service.
I’m currently using ArchiveBox [0], which provides a complete API + UI.
- [0] https://archivebox.io/
For archiving, look into https://github.com/dosyago/DiskerNet
It's real next gen thinking on this topic.
As for the featured tool wayback... If HN readers can't figure out what it does after reading docs, its likely the thinking behind it is equally unclear.
Looking at the link you gave does not help much in seeing what DiskerNet does and looks like, neither.
Keeping it simple, I download pages in Markdown adding some metadata (some tags). When I want images or more I use singlefile extension. Add Recoll to the mix and that's all I need.
Are you using all extractors when saving a page?
I tried ArchiveBox and Shiori, but neither stuck for some reason. The latter is a bit more lightweight, it can save the entire page as well as a Readability-based conversion: https://github.com/go-shiori/shiori/