Our great sponsors
-
fdupes
FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
-
scripts
Miscellaneous scripts that serve a stand-alone purpose that might be useful for others. (by taltman)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I recently used [fdupes](https://github.com/adrianlopezroche/fdupes) to figure out duplicate files from my amazon cloud drive / photos migration. Took about 2 days to scour through about 1.5TB worth of day.
Prior to settling on this approach, I found [this](https://unix.stackexchange.com/questions/277697/whats-the-quickest-way-to-find-duplicated-files) post to be very helpful. One of the respondent wrote [this](https://github.com/taltman/scripts/blob/master/unix_utils/find-dupes.awk) awk script that is supposedly very fast. However, it leverages the FreeBSD flavor of things. I [tried](https://github.com/taltman/scripts/issues/4) getting it to work on Linux, but couldn't get it to work given my awk-fu skills aren't so good.
You can use czkawka to find and remove duplicates. It's free, easy-to-use and pretty reliable. I also sometime use Starwinds dedupe analyzer to check if there still any data that can be deduplicated.