Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

rdfind

16 883 4.1 C++

find duplicate files utility

As far as I know, the standard tool for this is rdfind. This new tool claims to be "blazingly fast", so it should provide something to show it. Ideally a comparison with rdfind, but even a basic benchmark would make it less dubious. https://github.com/pauldreik/rdfind
But the main problem is not the suspicious performance, it's the lack of explanation. The tool is supposed to "find duplicate files (photos, videos, music, documents)". Does it mean it is restricted to some file types? Does it find identical photos with different metadata to be duplicates? Compare this with rdfind which clearly describes what it does, provides a summary of its algorithm, and even mentions alternatives.
Overall, it may be a fine toy/hobby project (3 commits only, 3 months ago), I didn't read the code (except for finding the command-line options). I don't get why it got so much attention.

rmlint

16 1,787 5.8 C

Extremely fast tool to remove duplicates and other lint from your filesystem

I use and test assorted duplicate finders regularly.
fdupes is the classic (going way way back) but it's really very slow, not worth using anymore.
The four I know are worth trying these days (depending on data set, hardware, file arrangement and other factors, any one of these might be fastest for a specific use case) are https://github.com/jbruchon/jdupes , https://github.com/pauldreik/rdfind , https://github.com/jvirkki/dupd , https://github.com/sahib/rmlint
Had not encountered fclones before, will give it a try.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
go-find-duplicates

3 260 2.6 Go

Find duplicate files (photos, videos, music, documents) on your computer, portable hard drives etc.
duphard

1 2 0.0 Go

A simple utility to detect duplicate files and replace them with hard links.

For example I maintain a tar file and a docker image with Kafka connectors which share many jar files. Using duphard I can save hundreds of megabytes, or even more than a gigabyte! For a documentation website with many copies of the same image (let's just say some static generators favor this practice for maintaining multiple versions), I can reduce the website size by 60%+, which then makes ssh copies, docker pulls, etc way faster speeding up deployment times.
https://github.com/andmarios/duphard

fclones

17 1,737 6.4 Rust

Efficient Duplicate File Finder

See also fclones (focuses on performance, has benchmarks https://github.com/pkolaczk/fclones). I didn't know about rdfind but thought the standard was fdupes https://github.com/adrianlopezroche/fdupes, which is as fast (or slow) as rdfind according to fclones (and fclones is much faster).

fdupes

17 2,370 2.3 C

FDUPES is a program for identifying or deleting duplicate files residing within specified directories.

See also fclones (focuses on performance, has benchmarks https://github.com/pkolaczk/fclones). I didn't know about rdfind but thought the standard was fdupes https://github.com/adrianlopezroche/fdupes, which is as fast (or slow) as rdfind according to fclones (and fclones is much faster).

mpifileutils

4 160 5.1 C

File utilities designed for scalability and performance.

If you want something that scales horizontally, dcmp from https://github.com/hpc/mpifileutils is an option.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
czkawka

361 17,680 7.7 Rust

Multi functional app to find duplicates, empty folders, similar images etc.

RESF checking in
The first one i found and still use when it got obvious that fslint is EOL is czkawka [0] (meaning hiccup in polish). Its' speed is an order of magnitude higher than fslint, memory use is 20%-75%.
<;)> Satisfied customer, would buy it again.
[0] https://github.com/qarmin/czkawka

fd

172 31,757 8.8 Rust

A simple, fast and user-friendly alternative to 'find'

```find some/location -type d -wholename '/January/Photos'```
https://github.com/sharkdp/fd

jdupes

44 1,681 0.0 C

Discontinued A powerful duplicate file finder and an enhanced fork of 'fdupes'.

I use and test assorted duplicate finders regularly.
fdupes is the classic (going way way back) but it's really very slow, not worth using anymore.
The four I know are worth trying these days (depending on data set, hardware, file arrangement and other factors, any one of these might be fastest for a specific use case) are https://github.com/jbruchon/jdupes , https://github.com/pauldreik/rdfind , https://github.com/jvirkki/dupd , https://github.com/sahib/rmlint
Had not encountered fclones before, will give it a try.

dupd

1 109 0.0 C

CLI utility to find duplicate files

I use and test assorted duplicate finders regularly.
fdupes is the classic (going way way back) but it's really very slow, not worth using anymore.
The four I know are worth trying these days (depending on data set, hardware, file arrangement and other factors, any one of these might be fastest for a specific use case) are https://github.com/jbruchon/jdupes , https://github.com/pauldreik/rdfind , https://github.com/jvirkki/dupd , https://github.com/sahib/rmlint
Had not encountered fclones before, will give it a try.

kindfs

2 3 4.1 Python

Index filesystem into a database, then easily make queries e.g. to find duplicates files/dirs, or mount the index with FUSE.

FWIW if people are interested, I wrote https://github.com/karteum/kindfs for the purpose of indexing the hard drive, with the following goals

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

I'm amazed how I find anything & why I have so many dupes!

4 projects | /r/DataHoarder | 8 Jul 2023
Johnny Decimal

4 projects | news.ycombinator.com | 13 Jun 2023
Any good duplicate file finder for windows?

3 projects | /r/sysadmin | 22 Apr 2023
ISO: Binary File Comparison Tool for Duplicate File Checks

5 projects | /r/DataHoarder | 25 Jan 2022
File Servers... how are you handling duplicates

1 project | /r/sysadmin | 8 Dec 2023

Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
C Duplicates Deduplication duplicate-files Filesystem
Post date: 29 Aug 2021

rdfind

rmlint

InfluxDB

go-find-duplicates

duphard

fclones

fdupes

mpifileutils

SaaSHub

czkawka

fd

jdupes

dupd

kindfs

Related posts

I'm amazed how I find anything & why I have so many dupes!

Johnny Decimal

Any good duplicate file finder for windows?

ISO: Binary File Comparison Tool for Duplicate File Checks

File Servers... how are you handling duplicates

Go Find Duplicates: blazingly-fast simple-to-use tool to find duplicate files

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com C Duplicates Deduplication duplicate-files Filesystem Post date: 29 Aug 2021

Related posts

I'm amazed how I find anything &amp; why I have so many dupes!

Johnny Decimal

Any good duplicate file finder for windows?

ISO: Binary File Comparison Tool for Duplicate File Checks

File Servers... how are you handling duplicates

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
C Duplicates Deduplication duplicate-files Filesystem
Post date: 29 Aug 2021

I'm amazed how I find anything & why I have so many dupes!