While working on a tool to eliminate duplicate photos from old hard drives, I wrote this little snippet of code that's worth saving:

if [ $(md5 -q 1.txt) == $(md5 -q 2.txt) ]; then echo "same"; else echo "different"; fi

It's a command line one-liner that generates MD5 hashes for two files, compares them and states if they are the same or different.

For those unfamiliar with MD5, it's a "cryptographic hash function that produces a 128-bit hash value." The useful part for this snippet is that MD5 can be fed a file of any size and the results is a 32 character string. Most importantly, the same input will always produce the same output and any difference (no matter how minor) creates a large difference in the result. For example, any computer can run MD5 on a file with the contents "asdfasdf-1" and it will produce the hash signature:

f3748c05e25ca8cce7795d1ec97749b0

If you change the one to a two (i.e. "asdfasdf-2") the signature changes to:

e9ca151e1882f63c5d05e7958a7527a9

More about this will come in another post, but what this means for a duplicate photo finder is that MD5 hashes can be generated for every photo and then compared. Any two files with the same hash signature are the same* and can be pared down. That is done with a larger program. The little snippet of code is used for verification. It's also useful enough to be broken out to its own.


*Note: For the tech/cryptography minded folks out there, I know that MD5 can have collisions. For what I'm doing, the chances are so small that I'm not worried about it.