Starting my own Data Hoarding

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • reddit-save

    A Python tool for backing up your saved and upvoted posts on reddit to your computer.

  • Reddit: yes, we are on reddit and there is considerable amount of science and art. Already archiving all saved posts using reddit-save and some low-traffic subreddits in their entirety using unknown tool.

  • youtube-dl

    Command-line program to download videos from YouTube.com and other video sites

  • Youtube: bunch of science there (Cody et.al). The idea is to first download all my personal playlists using youtube-dl archive mode and then perhaps start downloading select entire channels in low-res mode. Many videos were already deleted from my playlist. Youtube comments should also be saved, but not sure how yet.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • floccus

    :cloud: Sync your bookmarks privately across browsers and devices

  • Bookmarks: these are at less risk, because archive.org has been doing a good job of keeping sites alive, but still. Bookmarks will be synchronized to selfhosted webdav using floccus addon and then archived using Archive Box. There are few notes to it:

  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • Bookmarks: these are at less risk, because archive.org has been doing a good job of keeping sites alive, but still. Bookmarks will be synchronized to selfhosted webdav using floccus addon and then archived using Archive Box. There are few notes to it:

  • awesome-web-archiving

    An Awesome List for getting started with web archiving

  • Web site crawls: some personal websites are so packed with info that it is worth saving them in their entirety. But on a more complex sites, a naive crawl would produce ungodly amount of duplicate data due to cgi parameters like pagination and sorting. I have only like 3 websites crawled with wget. This will require more though and reading.

  • wikiteam

    Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.

  • Wikis - looks easy with Wiki Team.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts