Is there a good list of up-to-date data archiving tools for different websites?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

reddit-save

6 122 4.4 Python

A Python tool for backing up your saved and upvoted posts on reddit to your computer.

I'm mostly on reddit and I use reddit-save. It works well! Biggest issue is I'd like to be able to archive a thread to an arbitrary length.

TumblThree

4 564 8.0 C#

A Tumblr and Twitter Blog Backup Application

For Tumblr, I've found, but not tried TumblThree. It looks like it's built for Windows.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
monolith

23 9,870 6.9 Rust

⬛️ CLI tool for saving complete web pages as a single HTML file

besides wget, for single pages I use monolith https://github.com/Y2Z/monolith

orgdown

60 - -

Information management based on text files is quite common. There are cloud-based solutions which I loathe for reasons. Desktop wikis are another potential candidate and I was using some for myself a couple of years until I found the solution that has the most features, the greatest flexibility and a very large community: GNU Emacs with its Org-mode. The chosen file format will be then Orgdown.

Memacs

20 963 2.7 Python

What did I do on February 14th 2007? Visualize your (digital) life in Org-mode

Back to the original question. In order to get as much content as possible into a common format to be displayed in a common temporal view, I've created a framework that consists of some general functionality and a set of modules that deal with different input sources and formats. This project is called Memacs. You can also read a whitepaper about it.

SingleFileZ

28 1,759 9.6 JavaScript

Web Extension to save a faithful copy of an entire web page in a self-extracting ZIP file

If you do have files whose names begin with an ISO 8601 compliant date- or timestamp, the filenametimestamps module with do the trick. This way, I index all photographs, all web downloads, emails, usenet postings, ... just by choosing a specific file name prefix format. Same holds true for web pages which are automatically saved using SingleFileZ to files matching that filename prefix format. There you go, this is how I solve your original question.

readability-cli

11 - -

I already use orgmode a bit - mostly through org-roam! I have a pretty inefficient set-up where I save a webpages via SingleFileZ, then use readability-cli on it, then convert the readable output to an orgmode file. Definitely not efficient because I need to manually complete each step, but haven't bothered to try to automate it yet.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project