mpifileutils
rdfind
mpifileutils | rdfind | |
---|---|---|
4 | 16 | |
160 | 875 | |
0.6% | - | |
5.1 | 4.1 | |
21 days ago | 26 days ago | |
C | C++ | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mpifileutils
-
Pigz: A parallel implementation of gzip for multi-core machines
If you ever run into the limitations of a single machine, dbz2 is also a fun little app for this sort of thing. You can run it across multiple machines and it'll automatically balance the workload across them.
https://github.com/hpc/mpifileutils/blob/master/man/dbz2.1
- MpiFileUtils: File utilities designed for scalability and performance
-
Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files
If you want something that scales horizontally, dcmp from https://github.com/hpc/mpifileutils is an option.
- You can list a directory containing 8M files, but not with ls
rdfind
- Rdfind: A utilty to find duplicate files, delete them or replace with hardlinks
-
Self hosted, web gui, file duplication scanner
I use rdfind for this.
-
Is there a Mac app that will allow me to recursively go through thousands of folders, calculate the total folder size, then compare against all other folder sizes, and if the size is identical, delete the newer one?
rdfind is available for macOS; I've been using it on linux: https://github.com/pauldreik/rdfind
-
Deduplication on EXT4
You can use rdfind to find all duplicates in your experiments dir and replace files with hardlinks. This way files will occupy disk space only once and all inode references will be to the same disk location.
- How do I show non-duplicate files across 2 drives?
-
Pip and cargo are not the same
I use rdfind to deal with this: https://github.com/pauldreik/rdfind
- Backing Up Data: Tips/Advice for Tons of Unorganized Data and Duplicate Files from Multiple Sources
-
This has probably happened to all of us at least once
Yeah, I periodically download the full drives and just deduplicate with rdfind hardlinking identical files.
- AMD/Xilinx Vivado rant
-
recommends for de-duplication?
I use rdfind on my Linux NAS. https://github.com/pauldreik/rdfind
What are some alternatives?
fclones - Efficient Duplicate File Finder
fdupes - FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
rmlint - Extremely fast tool to remove duplicates and other lint from your filesystem
jdupes - A powerful duplicate file finder and an enhanced fork of 'fdupes'.
pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.
duphard - A simple utility to detect duplicate files and replace them with hard links.
coreutils - Enhancements to the GNU coreutils (especiall head)
dupeguru - Find duplicate files
kindfs - Index filesystem into a database, then easily make queries e.g. to find duplicates files/dirs, or mount the index with FUSE.