How do I manage years of data?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • PhotoPrism

    AI-Powered Photos App for the Decentralized Web 🌈💎✨

  • For a master collection of photos and videos I dump everything I find into my Photoprism. For example, I tossed everything from my google takeout into it.

  • czkawka

    Multi functional app to find duplicates, empty folders, similar images etc.

  • I recommend https://github.com/qarmin/czkawka for picture/video dedup, it can even search for similar things too.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • AntiDupl

    A program to search similar and defect pictures on the disk

  • Yes, czkawka is pretty good. If /u/AndypandyO is also interested in de-duplicating similar (but not exact) pictures, I recommend AntiDupl.

  • hashdeep

  • Started to be able to bring some order after I discovered hashdeep. Basically I started from a reasonably clean disk with folders to sort files, created lists of hashes using hashdeep, then used it to scan all my existing disks for unknown files. With the correct flags hashdeep can list all files it finds on a disk that it has not in its lists already. That help a lot to figure out what is worth wasting time on. It also is useful because every now and then that makes me realize the copy of some old file I have is broken (probably usually because it was stored on some CDROM that was no longer good).

  • k4dirstat

    K4DirStat (KDE Directory Statistics) is a small utility program that sums up disk usage for directory trees, very much like the Unix 'du' command. It displays the disk space used up by a directory tree, both numerically and graphically (copied from the Debian package description).

  • Huh cool, I'll have to give that a go! There's also K4DirStat if you have a craving for the WinDirStat style.

  • dupeguru

    Find duplicate files

  • I have data from drives from the early 00’s. Use dupe guru, look for 100% matches and possibly find whole redundant file structures to delete. https://dupeguru.voltaicideas.net/. I’ve been chipping away at it for years, still not done. Also, trying to organize my data into priority tiers of backup importance.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts