[NBLUG/talk] Finding duplicate files

Lincoln Peters lincoln_peters at hotmail.com
Mon Jul 7 17:10:03 PDT 2003


Maybe I should mention that for my purposes, there is no way that the script 
will have to deal with hard links.  Symlinks are possible, but very 
unlikely.

Although I suppose that it would be a good idea for it to handle hard links 
and symlinks correctly, in case someone with a similar problem that does 
involve hard links scans through the archives.




Lincoln




>From: Ross Thomas <spamb8r at netscape.net>
>Reply-To: talk at nblug.org
>To: talk at nblug.org
>Subject: Re: [NBLUG/talk] Finding duplicate files
>Date: Mon, 07 Jul 2003 16:07:19 -0700
>
>Eric Eisenhart wrote:
> > On Mon, Jul 07, 2003 at 01:25:35AM -0700, Ross Thomas wrote:
> >> Also handles embedded blanks and tabs.  Misbehaves when new-lines are
> >> embedded in a file name (sort & uniq aren't that sophisticated).
> >
> > Actually, the GNU sort has a "-z" option, equivalent to xargs' "-0" 
>option.
>
>While sort and uniq are the commands that will have the problem and
>sort has the '-z' option, md5sum isn't capable of producing null
>terminated output, which would defeat the '-z'.
>
> > One problem here.  "ln file1 file2" will create a duplicate that 
>actually
> > refers to the same file.
>
>This may or may not be a problem, depends on the intent of the user.
>For searching, hard-linked files are technically duplicates (even though
>they refer to the same disk storage):  You have two ways of referencing
>the same file contents.  A matter of semantics.  This obviously breaks
>down when you start changing file contents.  :-(
>
>However, in the original shell script you could replace the md5sum
>command with an invocation of the following script, located in the
>user's $PATH.  You could also make the decision of which to invoke
>by specifying an option and do a substitution based on that.
>
>------------ Cut Here ----------------
>#!/bin/sh
>
>for i in "$@"
>do
>     echo -n "`cat \"$i\" | md5sum | cut -c1-34 `"
>     ls -1i "$i"
>done
>
>------------ Cut Here ----------------
>
>HTH.
>
>Ross.
>
>_______________________________________________
>talk mailing list
>talk at nblug.org
>http://nblug.org/mailman/listinfo/talk

_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail




More information about the talk mailing list