VMWare ESX: How to recover your VMFS partition table

You might wake up for a bad day after a power outage or a storage failure. You thought it was over, after you had succeeded to bring all your machines back up and running. In few minutes you found out that some of your virtual machines are missing. After a small investigation you found out that your VMware Datastores (VMFS) are empty when you try to browse them from VMware vCenter or VI client. If you face this problem there is a big chance that your VMFS Partition table for these LUNs or disks are missing.

To Check Your VMware ESX server VMFS Partition table follow the following procedure:

1- Connect to the VMware ESX server where is the missing datastore (VMFS) was connected using SSH. Make sure you have a root access.

2- Run the following command to find out your SAN devices: esxcfg-vmhbadevs

The output will look something like below:
vmhba0:0:0 /dev/cciss/c0d0
vmhba1:0:1 /dev/sda
vmhba1:0:2 /dev/sdb
vmhba1:4:2 /dev/sdc

3- If you know the SAN device that is holding the missing datastore (VMFS) then run the following command on that device to check its partition table, else run it on all the devices and check them one by one. (Hint: The command to show the partition table for all the devices is ‘fdisk -lu’)

fdisk -lu /dev/sda    <== run this if you know that sda is the device holding the missing datastore (VMFS)

Output should look something like below for a LUN with the VMFS Partition table is missing:

Disk /dev/sda: 322.1 GB, 322122547200 bytes
255 heads, 63 sectors/track, 39162 cylinders, total 629145600 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sda doesn’t contain a valid partition table

or it could look something like below on some versions

Disk /dev/sde: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id

System Where a normal working entry with a valid partition table will look something like below:
Disk /dev/sdb: 16.1 GB, 16106127360 bytes
255 heads, 63 sectors/track, 1958 cylinders, total 31457280 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 128 31455269 15727571 fb Unknown <== Its your partition table entry Notice

in the last example the line of the partition table:

‘/dev/sdb1 128 31455269 15727571 fb Unknown’ <== This means the partition table exist.

If you have figured out that your VMFS partition table is missing then follow the below steps, else if your partition table exist just as in the last sample then this is not the solution for your case. If you found out this is the case, and you have VMware support I highly recommend you call them to help you recovering your partition table. As any mistake with this procedure provided can get you to loose your data permanently. If your only option is to recover on your own then the below procedure should do the trick for you as I had tried it 3 time before :).

VMware ESX VMFS Recovery Procedure steps: After you had found out that the affected device is /dev/sda from the procedure above, now its time to fix it. The procedure below assume /dev/sda is the defective device, please make sure to replace that with what ever device is failing in your environement when executing the below commands. As well make sure you are connected to ssh as a root. and run the below procedure. Entered commands are marked in red.

[root@vmwaretest vmhba2]# fdisk /dev/sda <== To start the fdisk (partitioning utility)

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won’t be recoverable. The number of cylinders for this disk is set to 39162. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help) n <== add a new partition

Command action

e extended

p primary partition (1-4)

p

Partition number (1-4): 1

First cylinder (1-39162, default 1): Hit Enter <== Take default

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-39162, default 39162): Hit Entert <== Take default

Using default value 39162

Command (m for help):  Change a partition type

Selected partition 1

Hex code (type L to list codes): fb <== VMFS partiton Type

Changed system type of partition 1 to fb (Unknown)

 

Command (m for help): x <== Expert mode

Expert command (m for help): b <== Move beginning of data in a partition

Partition number (1-4): 1

New beginning of data (63-629137529, default 63): 128 <== The partition offset used for VMFS

Expert command (m for help): w <== Write table to disk and exit

The partition table has been altered!

 

Wohooo if that worked for you, then are almost done. To check out your work follow the below steps.

In VMware vCenter go to Configuration -> Storage (SCSI, SAN and NFS) and hit refresh If all went well, the storage volume should reappear and the data should be accessable.

 

Yay, you are done. I hope not many of you face this problem, though this is your savior if you do. As I had seen this recovery steps documented no where else on the web when I had the problem for the first time. I had promised my self to put a post about it for a while, but I had been slacking off. Here it’s finally posted. Please let me know if this method helped you. In addition, let me know if you had any problem with it. Thanks.

Comments

  1. Hi Guenter,

    I can pass by you on my way to Toronto from Dubai so keep it ready :).

    Just kidding bro, unfortunately I have a direct flight already booked.

    Regards,
    Eiad

  2. Wow this just saved my backside.

    Fantastic info and worked perfectly for me first time.

  3. I am glad I was able to help Wullie

  4. f'ing happy says

    Just wanted to drop a line to say thanks. This saved me a TON of headache…

  5. I have an ESXi 5 host that after a restart, i have lost some of my Extents. Will this help me out? I am in a real pinch here.

    Thanks
    Steve

  6. Hi Steve,

    I am worried this post won’t be of any help in that case. Though how did you lose your extents? Are they still accessible from the SAN? were they erased or data still there?
    Maybe restore VMs from back if it come to worse case scenario.

    Regards,
    Eiad

  7. hi

    running vmfs 3 san lun on esxi 5, accidently reinstall & quick formatted the san lun .

    now i am able to see new data store of same name (1) & but no vmdk files …

    any suggestion how to recover my virtual machines

    thanks

    vikki

  8. Eiad,

    When my co-worker and I first read your post we were skeptical, but this really worked for us!

    SYMPTOMS:
    – Simple 2-disk (Local) RAID1 with a failed drive.
    – After shutting the server down, replacing the bad drive, letting it rebuild, and rebooting the server to ESXi, a Datastore was missing and when you go to re-add storage, there was no VMFS Label on the available LUN.

    FIX:
    – Followed your steps exactly, but had to re-mount the VMFS Datastore that was re-partitioned.

    OUTCOME:
    – NO DATA LOSS!

  9. JHCR, Thanks for the feedback and glad I was able to help.

  10. Hi Eiad,

    thanks a lot for this post, it worked fine.
    Symptom: datastore on a fibre channel SAN did not show any files when browsing via vCenter and cloning was impossible. After I powered off the last virtual machine on this datastore the store became inactive, re-adding with vCenter showed “Hard disk is blank”. So i stopped proceeding with vCenter and followed your steps and after a rescan of the datastores it changed its state to active and all data is accessible again, browse datastore and cloning a vm is working also working again. Thx 🙂

  11. Thanks a lot for step by step procedure..its worked for me.

  12. thanks heaps for the article. Was going to hit my head of the wall. My data is restored and the wall is save 🙂

  13. Hello,after loss of power on the entire building for many hours the virtual machines are missing.I can locate them on the datastore but nothing else.The virtual machine list is empty.I;m new to Vmware and I can’t find a way to put em back unless i start all over again.Thank You in advance

  14. Hi Dimismx,

    If you can see the VMs in the inventory then you can right click the vmx file and choose add to inventory, though you might want to contact VMware Support for help as they need to look at the environment and find out if there is a quick fix.

    Thanks,
    Eiad

  15. Hie there,

    I’ve got the following problem:
    ESXi-Server Version 5.5 with 2 Datastores.
    Datastore2 containing originally my virtual machines.
    Unfortunately the virtual machines were deleted via vSphere-Client.
    No other action was taken so far.
    Is there a chance to get the virtual amchines back to life?

    Thanks in advance,
    hsack

  16. Hi hsack,

    If you only deleted it from inventory then you can easily go the folder of that VM into your datastore and right click the vmx file and hit add to inventory. Though if you deleted it from disk, then your task is a bit tougher and you will need a data recovery tool that can do VMFS datastore. Such a tool exist but can be costly.

    Thanks,
    Eiad

  17. Hi,
    I am using ESXi 5.1,recently four of my data stores went inactive when the ESXi was powered off for few hours after reboot.
    I could see the LUNS are available at HBAs but data stores are inactive.
    I tried to apply the above mentioned solution, but it says that the commands are depreciated, it would be great if you could provide a workaround compatible to ESXi 5.1 version.

    Thanks,
    Haritha

  18. Mohamed Elmasry says

    I have Blade server, San Storage ds3500 and VMware 6. after recovery from fault logical array on SAN storage I wrongly add the datastore again so the partition was deleted and the logical array is empty.

    Does this fix apply here too ??

  19. Eiad,
    I hope you still check this thread because I have to tell you I thought I was toast. We had a vmfs 3.x volume go down. It held one VM deemed important enough to have its own datastore, a 2-disk RAID 0 volume that turned up missing. So I didn’t have much hope. I went ahead and used the IBM ServeRAID controller to remap bad sectors on the 2 SCSI drives, then re-create the same configuration RAID. Scanning the resulting volume I found strings like “vmdk” and the VM’s name. This rekindled some hope but where to go from here? I had a hunch it was all in the initial bits on the volume, and with some luck the rest would fall into line. But in VSphere there is no way (that I know) to recover a datastore whose ghost resides on a volume of this nature. Then along came your post.
    Thank you,
    -Bill

  20. Glad it helped Bill!

Trackbacks

  1. […] – there’s an outside chance that this could help but it’s very SC-specific rather than being […]

Speak Your Mind

*