19.6 Duplicate Files Using fdupes

20230603

See Section19.5 for an example of an interactive session using fdupes. In this section we explore the command line options for fdupes.

A summary of the duplicates found can be obtained using the --summarize or -m option:

fdupes --summarize .

The output will look something like:

13567 duplicate files (in 6407 sets), occupying 16996.0 megabytes

fdupes requires at least one command line argument (a path to a directory). In the above a period (.) is used to indicate the current directory.

With no further options fdupes lists groups of duplicated files in the specified directory:

fdupes .
./20180323_thesis_02.pdf
./20180323_thesis_01.pdf
./20180323_thesis.pdf

./20030102_pakdd01_03.pdf
./20031012_pakdd01.pdf

./20200531_siunits_01.pdf
./20200531_siunits.pdf

Use the --recurse or -r option to recurse into sub-directories.

fdupes can automatically delete duplicates, retaining the first of the listed files by providing --delete and -noprompt.

A general heuristic a user might have is to keep the original rather than files with versioned file names, noting they contain exactly the same content. Ordering the list by name and then reversing the order can be useful:

$ fdupes --order='name' --reverse .
./20180323_thesis.pdf
./20180323_thesis_01.pdf
./20180323_thesis_02.pdf

./20031012_pakdd01.pdf
./20030102_pakdd01_03.pdf

./20200531_siunits.pdf
./20200531_siunits_01.pdf

The following command will then delete duplicates, keeping the first file in the list, the list being ordered in reverse by the filename:

fdupes --delete --noprompt --order='name' --reverse .

The --ommitfirst or -f option will generate a list of duplicate files excluding the first of the duplicates. This is then a list that can be saved to file to generate a script to manually delete the duplicate files if desired.



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0