rdfind
finds duplicate files across and/or within several directories. It calculates
checksum only if necessary.
rdfind runs in O(Nlog(N)) time.
If duplicates are found, the duplicate first encountered on the command line
is considered the original. If duplicates exist within the
same directory found during processing a specific input argument,
the file with lowest depth is considered to be the original.
Depth is calculated relative to the input argument, not relative to /.
If identical files are found in the same directory and during processing a
specific input argument, precedence is undefined.
To include files or directories that have names starting with -, use
rdfind ./- to not confuse them with options.
OPTIONS
Searching etc:
-ignoreempty true|false
Ignore empty files. (default)
-followsymlinks true|false
Follow symlinks. Default is false.
-removeidentinode true|false
removes items found which have identical inode and device ID. Default
is true
-checksum md5|sha1
what type of checksum to be used: md5 or sha1. Default is md5.
Actions:
-makesymlinks true|false
Replace duplicate files with symbolic links
-makehardlinks true|false
Replace duplicate files with symbolic links
-makeresultsfile true|false
Make a results file results.txt (default) in the current directory.
-deleteduplicates true|false
Delete (unlink) files.
General:
-n -dryrun
displays what should have been done, dont actually delete or link anything.
-h, -help, --help
displays brief help message.
-v, -version, --version
displays version number.
FILES
results.txt
The results file results.txt will contain one row per duplicate file
found, along with a header row explainging the columns.
A text describes why the file is considered a duplicate:
DUPTYPE_UNKNOWN some internal error
DUPTYPE_FIRST_OCCURENCE the file that is considered to be the original.
DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing
the directory in the same input argument as the original)
DUPTYPE_OUTSIDE_TREE the file is found during processing another input
argument than the original.
The default output file with a description of the duplicate found.
ENVIRONMENT
DIAGNOSTICS
EXIT VALUES
0 on success, nonzero otherwise.
BUGS/FEATURES
When specifying the same directory twice, it keeps the first
encountered as the most important (original), and the rest as
duplicates. This might not be what you want.
There are lots of enhancements left to do. Please contribute!