dupd
mpifileutils
dupd | mpifileutils | |
---|---|---|
1 | 4 | |
109 | 160 | |
- | 0.6% | |
0.0 | 5.1 | |
11 months ago | 22 days ago | |
C | C | |
GNU General Public License v3.0 only | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dupd
-
Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files
I use and test assorted duplicate finders regularly.
fdupes is the classic (going way way back) but it's really very slow, not worth using anymore.
The four I know are worth trying these days (depending on data set, hardware, file arrangement and other factors, any one of these might be fastest for a specific use case) are https://github.com/jbruchon/jdupes , https://github.com/pauldreik/rdfind , https://github.com/jvirkki/dupd , https://github.com/sahib/rmlint
Had not encountered fclones before, will give it a try.
mpifileutils
-
Pigz: A parallel implementation of gzip for multi-core machines
If you ever run into the limitations of a single machine, dbz2 is also a fun little app for this sort of thing. You can run it across multiple machines and it'll automatically balance the workload across them.
https://github.com/hpc/mpifileutils/blob/master/man/dbz2.1
- MpiFileUtils: File utilities designed for scalability and performance
-
Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files
If you want something that scales horizontally, dcmp from https://github.com/hpc/mpifileutils is an option.
- You can list a directory containing 8M files, but not with ls
What are some alternatives?
fclones - Efficient Duplicate File Finder
jdupes - A powerful duplicate file finder and an enhanced fork of 'fdupes'.
rmlint - Extremely fast tool to remove duplicates and other lint from your filesystem
go-find-duplicates - Find duplicate files (photos, videos, music, documents) on your computer, portable hard drives etc.
pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.
rdfind - find duplicate files utility
duphard - A simple utility to detect duplicate files and replace them with hard links.
fdupes - FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
coreutils - Enhancements to the GNU coreutils (especiall head)