rdfind
mpifileutils
rdfind | mpifileutils | |
---|---|---|
16 | 4 | |
883 | 160 | |
- | 0.6% | |
4.1 | 5.1 | |
about 1 month ago | 27 days ago | |
C++ | C | |
GNU General Public License v3.0 or later | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
rdfind
- Rdfind: A utilty to find duplicate files, delete them or replace with hardlinks
-
Self hosted, web gui, file duplication scanner
I use rdfind for this.
-
Is there a Mac app that will allow me to recursively go through thousands of folders, calculate the total folder size, then compare against all other folder sizes, and if the size is identical, delete the newer one?
rdfind is available for macOS; I've been using it on linux: https://github.com/pauldreik/rdfind
-
Deduplication on EXT4
You can use rdfind to find all duplicates in your experiments dir and replace files with hardlinks. This way files will occupy disk space only once and all inode references will be to the same disk location.
- How do I show non-duplicate files across 2 drives?
-
Pip and cargo are not the same
I use rdfind to deal with this: https://github.com/pauldreik/rdfind
- Backing Up Data: Tips/Advice for Tons of Unorganized Data and Duplicate Files from Multiple Sources
-
This has probably happened to all of us at least once
Yeah, I periodically download the full drives and just deduplicate with rdfind hardlinking identical files.
- AMD/Xilinx Vivado rant
-
recommends for de-duplication?
I use rdfind on my Linux NAS. https://github.com/pauldreik/rdfind
mpifileutils
-
Pigz: A parallel implementation of gzip for multi-core machines
If you ever run into the limitations of a single machine, dbz2 is also a fun little app for this sort of thing. You can run it across multiple machines and it'll automatically balance the workload across them.
https://github.com/hpc/mpifileutils/blob/master/man/dbz2.1
- MpiFileUtils: File utilities designed for scalability and performance
-
Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files
If you want something that scales horizontally, dcmp from https://github.com/hpc/mpifileutils is an option.
- You can list a directory containing 8M files, but not with ls
What are some alternatives?
fdupes - FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
fclones - Efficient Duplicate File Finder
jdupes - A powerful duplicate file finder and an enhanced fork of 'fdupes'.
rmlint - Extremely fast tool to remove duplicates and other lint from your filesystem
pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.
duphard - A simple utility to detect duplicate files and replace them with hard links.
dupeguru - Find duplicate files
coreutils - Enhancements to the GNU coreutils (especiall head)
kindfs - Index filesystem into a database, then easily make queries e.g. to find duplicates files/dirs, or mount the index with FUSE.