Do you download single webpages? If so, how? And how do you organize them?

This page summarizes the projects mentioned and recommended in the original post on /r/datacurator

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • SingleFile

    Web Extension for saving a faithful copy of a complete web page in a single HTML file

  • You're kind of using the best solution. SingleFile and SingleFileZ are perfect solutions for personal collecting.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Memacs

    What did I do on February 14th 2007? Visualize your (digital) life in Org-mode

  • I use SingleFileZ and use adapted ISO timestamps as a prefix for the filename. This then gets moved to an archive folder per month and indexed by the filename module of Memax. This way, I get all web pages archived including their content and the event gets into my calendar for temporal retrieval.

  • monolith

    ⬛️ CLI tool for saving complete web pages as a single HTML file

  • I use https://github.com/Y2Z/monolith ; Keep in mind that the HTML can be quite large, so you might want to process it a little if you care about the size. But the great thing about monolith is that you download the whole HTML, CSS, JS and images in a single file, perfect for offline archival even if the website is gone.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts