Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SingleFileZ
Web Extension to save a faithful copy of an entire web page in a self-extracting ZIP file
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I'm mostly on reddit and I use reddit-save. It works well! Biggest issue is I'd like to be able to archive a thread to an arbitrary length.
For Tumblr, I've found, but not tried TumblThree. It looks like it's built for Windows.
besides wget, for single pages I use monolith https://github.com/Y2Z/monolith
Information management based on text files is quite common. There are cloud-based solutions which I loathe for reasons. Desktop wikis are another potential candidate and I was using some for myself a couple of years until I found the solution that has the most features, the greatest flexibility and a very large community: GNU Emacs with its Org-mode. The chosen file format will be then Orgdown.
Back to the original question. In order to get as much content as possible into a common format to be displayed in a common temporal view, I've created a framework that consists of some general functionality and a set of modules that deal with different input sources and formats. This project is called Memacs. You can also read a whitepaper about it.
If you do have files whose names begin with an ISO 8601 compliant date- or timestamp, the filenametimestamps module with do the trick. This way, I index all photographs, all web downloads, emails, usenet postings, ... just by choosing a specific file name prefix format. Same holds true for web pages which are automatically saved using SingleFileZ to files matching that filename prefix format. There you go, this is how I solve your original question.
I already use orgmode a bit - mostly through org-roam! I have a pretty inefficient set-up where I save a webpages via SingleFileZ, then use readability-cli on it, then convert the readable output to an orgmode file. Definitely not efficient because I need to manually complete each step, but haven't bothered to try to automate it yet.