Not the best sight to greet you when you try and start up a VMWare session on Fusion (VMWare’s OSX implementation). I knew that I had caused this by Force Quitting Fusion the night before because I thought it had hung. In fact it was moving disk blocks around because I had deleted a snapshot.
The Library looked okay:
I could see no ‘reclaimable’ shading on the bar so it seemed that the old snapshot had been cleaned up correctly.
But the session would not start. I needed to find the log file to see what was going on.
So the first thing is to find the VMWare files in question. I navigated to the following path (where raza is the username for the machine):
Inside this folder was one file – the container holding the operating system I am using on the machine under Fusion.
You need to right click on the file and Show Package Contents to open the container,
So I looked in the log file and saw:
2014-06-18T19:13:20.515Z| Worker#0| I120: DISK: OPEN scsi0:0 ‘/Users/raza/Documents/Virtual Machines.localized/Windows XP Professional.vmwarevm/Windows XP Professional-000002.vmdk’ persistent R
2014-06-18T19:13:20.528Z| Worker#0| I120: Current OS Release is 13.2.0
2014-06-18T19:13:20.573Z| Worker#0| I120: DISKLIB-SPARSECHK: [/Users/raza/Documents/Virtual Machines.localized/Windows XP Professional.vmwarevm/Windows XP Professional-000002.vmdk] GT Error (GG2): GT = 72349440 / 14630400
2014-06-18T19:13:20.759Z| Worker#0| I120: DISKLIB-SPARSECHK: [/Users/raza/Documents/Virtual Machines.localized/Windows XP Professional.vmwarevm/Windows XP Professional-000002.vmdk] Grain #565030 @72349568 is orphaned.
2014-06-18T19:13:20.759Z| Worker#0| I120: DISKLIB-SPARSECHK: [/Users/raza/Documents/Virtual Machines.localized/Windows XP Professional.vmwarevm/Windows XP
followed by more “Grain #nnnnnn is orphaned” messages for the the vast majority of the 106MB log file.
It was at this point I realised that I had older snapshots, and I had a backup of the data inside the current container because I use a backup agent when the container is working. What I didn’t have was a backup of the container (or rather of the physical file). I hadn’t done this because of course the file is huge and when you are in Fusion, the file is open so you can’t get a consistent backup anyway.
So I did what any sensible person does at this point. I googled. It looked like it might just be a lock file issue (it wasn’t, and that was the solution because I did have to go to an older snapshot in the end) so I thought I would visually document the answer if it had been a lock file problem – which has the same symptoms.
In the folder, the 02 suffix file was the snapshot that had the issue – 75GB of changes… ouch!
The file extensions are explained very well at http://on-cloud9.com/2012/01/16/virtual_machine_files_explained/ and https://www.vmware.com/support/ws55/doc/ws_learning_files_in_a_vm.html so I don’t intend to repeat it here.
Anyway, you can see the lock folder file with the .lck extension:
and inside this folder is a lock file:
So I deleted the folder.
Now the Library window shows that the snapshots are accessible.
Sure enough, you can see them.
But annoyingly if you try and start the Current State file, it still fails with the same error.
sudo “/Applications/VMware Fusion.app/Contents/Library/vmware-vdiskmanager” -R /Users/raza/Documents/Virtual\ Machines.localized/Windows\ XP\ Professional.vmwarevm/Windows\ XP\ Professional-000002.vmdk
No joy though, as I got the message:
The virtual disk, ‘/Users/raza/Documents/Virtual Machines.localized/Windows XP Professional.vmwarevm/Windows XP Professional-000002.vmdk’, is corrupted and cannot be repaired.
There was no choice but to go back to the last snapshot and then apply all the changes I had made. Luckily there were not many, and those that could have been tricky – like applications, turned out to be okay because I had retained copies of them on a network volume (I had deleted them from the physical PC that was being migrated to a VM as I completed each transfer). Once that was done I reinstalled the backup software and pulled any local user files from the last backup.
Of course this was a learning exercise and I learnt I should have been
a) using automatic snapshots using the Fusion Autoprotect feature
b) that I should backup the container vmdk file (so I now do that once a week, if the file is not open – an option in Crashplan – to a local NAS)
c) that snapshots are not equal to backups because they are on the same physical host and even if you were to copy them elsewhere, they are a chain of files holding changes from the last snapshot. That means if you lose one, you risk not being able to use later snapshots.
The article https://communities.vmware.com/thread/177906 and https://communities.vmware.com/message/2118363 suggests I might be able to open the corrupt vmdk using VDK or UFS Explorer under a windows 32bit platform… I’ll update this post if I ever try that.
Update: 21st June 2014:
Wow! I tried UFS Explorer (on OSX) against the damaged vmdk file. It had absolutely no problem reading it and let me do a recovery of files without a problem. I didn’t need anything from the container but at least I was able to verify that I had not missed anything in recreating it from an old snapshot + backups.