Need help with data migration and deduplication

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • rmlint

    Extremely fast tool to remove duplicates and other lint from your filesystem

  • For CLI I'd say rmlint. (There is supposed to be a GUI but I was never able to get it working. YMMV.) The dev is in the sub sometimes. Very powerful. Someone mentioned checksums. This program can do checksums then embed them in the metadata for the file, meaning they don't have to be recalculated in the future if the file hasn't been changed. So rescans are fast. As previous, don't go to crazy with huge scans until you really know what you are doing.

  • rsync

    An open source utility that provides fast incremental file transfer. It also has useful features for backup and restore operations among many other use cases.

  • I have also heard people talking about using other programs that have reduplication built in as a way to accomplish this, most notable rsync and also borg backup. These require a bit more confidence in one's skills than I have at the moment for the task at hand.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • exiftool

    ExifTool meta information reader/writer

  • Another CLI tool you should know about when dealing with large amounts of photos, it's useful in different ways, is exiftool.

  • dupeguru

    Find duplicate files

  • The best thing to use with a graphical interface is DupeGuru. It is free and open source. It has a specific mode for photos but I don't love it, I just use the regular mode. I advice that you do not attempt to do everything in one batch. The results are overwhelming. Try doing it in pieces. Also in this way you might be able to establish a "master" copy to compare everything else to.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts