It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
I use an external hard drive for backup of my files (games, music, photos, documents, etc).

Now, since I've being doing this for some years (and yes, I do manual backups via copy and paste) I just wanted to make sure that I hadn't missed any files out and that they were all up to date. As such I decided the best way would be to carry out a full MD5 digest on every file within my backed up folders. The nifty little FCIV utility from Microsoft does the job just fine, especially as it can output to XML file.

This evening I carried out the check and the results are as follows:

11871 files checked
3 files found to be missing (and so I've backed these up)
7 files with MD5 discrepancy

Of the 7 files with a discrepancy, 2 were out of date as my local copies had a later modified date, so those were backed up again.

This leaves 5 files. 4 of these were Excel spreadsheets and just to do a test I opened one and closed it again (no save) and rechecked the file again. On doing this the MD5 hash for that file changed! Is this normal? Is it some property associated to the file like last opened or something? Anyway, just to make me happy I backed up my local copies of these 4 spreadsheets, leaving just 1 more discrepancy to check. This is where it gets strange.

The last file is an old file I've not used for 9 years, so it was simply copied to my current machine, it has never been used on it. Similarly it was just copied to my external drive for backup. It is actually a RiscOS Acorn Archimedes file that I used to use via emulator. Anyway, I opened both my local copy and the backup copy in Notepad++ in order to do a file compare between the two. It came back saying that they were identical. As such, I ask the question, is it possible for two identical files to have different MD5 digest? And if not, why am I getting a discrepancy with this file?
This question / problem has been solved by Pidgeotimage
Could there be something in the files not displayed by Notepad++?
http://www.computerhope.com/fchlp.htm

fc /b file1 file2

If there's a discrepancy between the two, fc will show it to you.
avatar
korell: As such, I ask the question, is it possible for two identical files to have different MD5 digest?
No. MD5 is calculated exclusively from the contents of the file, and as such, two identical files must have the same MD5 hash. (The opposite does not hold true, however: it is possible for two different files to have the same MD5 hash).
avatar
korell: On doing this the MD5 hash for that file changed! Is this normal? Is it some property associated to the file like last opened or something
Generally no, but Excel might store some metadata on its own as part of the file. That part may also depend on whether which Excel version you're using, whether it's an .xls or .xlsx file, and possibly other factors as well.
avatar
korell: And if not, why am I getting a discrepancy with this file?
Because the two files are different somehow. Exactly how, I don't know; the difference may not be significant from a functional perspective (depending on the file formats).
avatar
korell: Anyway, I opened both my local copy and the backup copy in Notepad++ in order to do a file compare between the two
Notepad++ isn't designed to work with binary data. I would not expect it to be able to compare files properly.

What you can do instead is to run the FC command line utility with the /B switch to see which bytes are different (and how). Again, depending on the file format, that may not be very meaningful in the first place.
avatar
korell: This leaves 5 files. 4 of these were Excel spreadsheets and just to do a test I opened one and closed it again (no save) and rechecked the file again. On doing this the MD5 hash for that file changed! Is this normal?
Those old Microsoft Office file formats are really complex, so I wouldn't worry too much about it, particularly if you opened and re-saved with a different version than the file was originally created with. I'd expect there to be any number of changes each time you open and close it.

If you really want to see what has changed, I wouldn't use Notepad++, I'd use something like 010 Editor that is a better hex editor. It has a compare feature that will tell you what has changed and highlight the bytes that are different between files. If the only change is a date, I'd expect a few 4-byte changes and not much else.

Good luck interpreting it, there aren't any public templates for the old MS formats and they aren't exactly straightforward without some kind of parsing. It'll show you the extent of the changes, though.
Post edited November 24, 2012 by Shinook
Well, I re-backed up the file anyway and now the MD5 hashes match. No idea what happened to make them different.

Pidgeot, the Excel spreadsheets were .xls format but I use Excel 2007. I checked the files and they work fine, but the MD5 hash changes just by opening and closing them, so something must be changing.

I was sure that MD5 hash was a unique function on a file contents, so glad I got confirmation from others. Technically it is many-to-one but I'm very unlikely to find two files with same MD5.