fdupes: Identify or Delete Duplicate Files

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

jdupes

44 1,681 0.0 C

Discontinued A powerful duplicate file finder and an enhanced fork of 'fdupes'.

200 lines of Nim [1] seems to run about 9X faster than the 8000 lines of C in fdupes on a little test dir I have. If you need C, I think jdupes [2] is faster as @TacticalCoder points out a couple of times here. In my testing, `dups` is usually faster than `jdupes`, though.
[1] https://github.com/c-blake/bu/blob/main/dups.nim
[2] https://github.com/jbruchon/jdupes

fdupes

17 2,354 2.5 C

FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
fclones

17 1,721 6.8 Rust

Efficient Duplicate File Finder
duperemove

16 659 9.2 C

Tools for deduping file systems

Very useful for identifying files that may need to get deduplicate or that can be removed entirely. Unfortunately, I don't think this will also find identical directories.
If deleting files isn't what you want, I'd suggest looking into deduplicating tools.
ZFS has its own de duplicator built in, which is nice. It should just deduplicate files and individual extents of files by itself once you enable it. Probably not a good idea on very write-heavy disks, but it's an option.
Other file systems with extent level deduplication can use https://github.com/markfasheh/duperemove to not only deduplicaye files, but also deduplicate individual extents. This can be very useful for file systems that store a lot of duplicate content, like different WINE prefixes. For filesystems without extent deduplication, duperemove should try hard linking files to make them take up practically no disks space.

Git

285 49,964 10.0 C

Git Source Code Mirror - This is a publish-only repository but pull requests can be turned into patches to the mailing list via GitGitGadget (https://gitgitgadget.github.io/). Please follow Documentation/SubmittingPatches procedure for any of your improvements.

You know another project with much of its source files in the top-level directory? https://github.com/git/git

czkawka

361 17,501 7.7 Rust

Multi functional app to find duplicates, empty folders, similar images etc.

I've used Czkawka (https://github.com/qarmin/czkawka) because it does Lanczos-based image duplicate detection, which makes it more practical for me.

duff

1 86 10.0 C

Command-line utility for finding duplicate files
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
rmlint

16 1,776 5.8 C

Extremely fast tool to remove duplicates and other lint from your filesystem

My preferred solution is rmlint [https://github.com/sahib/rmlint] mostly because it also looks at duplicate directories. It produces a bash script instead of deleting anything itself, so you can examine it before running the script it made.

lsdup

1 5 10.0 Rust

List duplicate files and directories, then optionally take action on them, all from a commandline.

Writing a program like this is one of the first exercises I give myself when learning a new programming language, because it touches a little bit of everything (reading files, output, CLI, using libraries, hashmaps, functions, loops, conditionals, etc) and isn't too onerous to implement.
My latest (it's a few years old at this point) is lsdup (rust version) using blake3 for hashing the content: https://github.com/redsaz/lsdup/
All it does is list the groups of duplicate files, grouped by hash, groups ordered by size. I'll usually pipe the output to a file, then do whatever I want to the list, and run a different script to process the resulting list. It works fine enough.

bu

16 51 9.2 Nim

B)asic|But-For U)tility Code/Programs (in Nim & Often Unix/POSIX/Linux Context)

200 lines of Nim [1] seems to run about 9X faster than the 8000 lines of C in fdupes on a little test dir I have. If you need C, I think jdupes [2] is faster as @TacticalCoder points out a couple of times here. In my testing, `dups` is usually faster than `jdupes`, though.
[1] https://github.com/c-blake/bu/blob/main/dups.nim
[2] https://github.com/jbruchon/jdupes

kindfs

2 3 4.1 Python

Index filesystem into a database, then easily make queries e.g. to find duplicates files/dirs, or mount the index with FUSE.

fdupes is really nice and fast, but (as far as I remember) it was lacking two features that I needed for my use case, which were 1°/ list duplicate dirs (without listing all of the duplicate sub-contents), and 2°/ being able to identify that all the contents in one dir would be included in another part of the FS (regardless of files/dir structures), which is particularly useful when you have a bigmess/ directory that you progressively sort-out in a clean/ directory. Said differently : fdupes helps to regain space but was not able to help me much to cleanup a messy drive...
This is why I wrote https://github.com/karteum/kindfs (which indexes the fs into an sqlite DB and then enables various ways to process it).

dude

1 53 8.8 Python

Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation. (by PJDude)

Hi. I recommend my little program, the bottleneck is the gui in tkinter, but maybe it will be useful to someone:
https://github.com/PJDude/dude

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

I'm amazed how I find anything & why I have so many dupes!
4 projects | /r/DataHoarder | 8 Jul 2023
Johnny Decimal
4 projects | news.ycombinator.com | 13 Jun 2023
Any good duplicate file finder for windows?
3 projects | /r/sysadmin | 22 Apr 2023
ISO: Binary File Comparison Tool for Duplicate File Checks
5 projects | /r/DataHoarder | 25 Jan 2022
Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files
14 projects | news.ycombinator.com | 29 Aug 2021

fdupes: Identify or Delete Duplicate Files

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
C Duplicates Deduplication Fdupes Version control
Post date: 2 Nov 2023

jdupes

fdupes

WorkOS

fclones

duperemove

Git

czkawka

duff

InfluxDB

rmlint

lsdup

bu

kindfs

dude

Related posts

fdupes: Identify or Delete Duplicate Files

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com C Duplicates Deduplication Fdupes Version control Post date: 2 Nov 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
C Duplicates Deduplication Fdupes Version control
Post date: 2 Nov 2023