While being at a customer site last week, I was asked if I could help with a mysterious VMFS datastore behavior. That particular datastore out of a sudden became none accessible and they could not carry out any changes to it. They can not VMotion in or out of it, or even create a folder into it. After running Storage Adapters Rescan on some of the ESXi hosts, they could not view that datastore any more. Checking out the logs at /var/log/vmkernel.log, we have noticed the error “ATS-Only VMFS Volume ‘VMFS5′ not mounted. Host does not support ATS or ATS initialization has failed.” shown in the below screenshot (Double Click it for full size).
What is Atomic Test & Set (ATS)?
Before I go about explaining the cause of the above error & how to resolve it, I thought it will make sense to share some background about ATS and where is the idea of ATS-Only VMFS Volumes has came from which directly relate to the cause of this problem.
Atomic Test & Set (ATS) was introduced as one of the fundamentals operations of vStorage API Array Integration(VAAI). ATS is used during creation and locking of files on the VMFS volume. To simplify this a bit, before ATS ESXi had to lock the full datastore using SCSI Reservations when ever it needed to update the VMFS Meta-data (VMs Disk Created, Expanded, snapshot being created, or so on). That used to have performance implications when you start putting a large number of VMs on the same datastore. To improve this VMware has came up with hardware assisted locking called ATS, which allow the ESXi host to discrete locking per disk sector rather than the full datastore which has dramatically improved the performance of the VMFS locking mechanism.
Why ATS-Only Volumes?
While designing vSphere 5, it seems the decision was made when provisioning a new VMFS 5 Datastore on a storage that support VAAI (In Particular ATS ) the datastore will be created as an ATS-Only Volume. That means it will only support the ATS locking mechanism. The reason behind this decision was to save the hosts of keep inquiring the storage for ATS support after the VMFS datastore has been created to reduce operations ovearhead.
It is important to note though, this is not the case for upgraded VMFS Datastore. If you upgrade a datastore from VMFS 3 to VMFS 5 where the storage support ATS, your VMFS will be setup to use ATS as the preferred method, but revert back to SCSI reservations when ATS does not operate.
The below table shows the locking mechanism used depending on how you formatted your VMFS Datastore.
If you are not sure if the datastore you are working with has been configured with ATS-only or not the below command should help you get that info:
# vmkfstools -Ph -v1 /vmfs/volumes/VMFS-volume-name
You see output similar to:
VMFS-5.54 file system spanning 1 partitions.
File system label (if any): ats-test-1
Mode: public ATS-only
Note the ATS-only in the above output. If this LUN was actually upgraded to VMFS5 from VMFS3 and it was not ATS-only the Mode field will only say public.
As now you know the background of ATS Only Datastores, & where it came from it is time to talk about the pitfall that has caused the error I was getting. As I have mentioned earlier an ATS-Only datastore will only support the ATS locking mechanism, & can not revert back to SCSI reservation even if ATS is not available any more. Actually in a situation where you have an ATS-Only datastore and no ATS support from the storage or host the datastore will become inaccessible which explain the error I was having. Let’s give few examples where you can get your self into this situation:
- You have created new VMFS5 datastores on a storage that support (VAAI/ATS Locking), then after the VMFS datstores were created as an ATS Only datastore you disable ATS support on the storage side.
- You have moved/copied the ATS-Only datastore to a new storage that does not support ATS locking.
- (Most Common) You have created new VMFS5 datastore on storage that support (VAAI/ATS locking), then after the VMFS datastore were created as an ATS Only datastore you enable a feature on the storage that get ATS to be not supported on that particular array/datastore. An example on this is enabling any of the copy services (HUR/TC/SI/QS) on Hitachi HDS USP-V/VM. This particular storage support ATS locking, as long you don’t use a copy service feature. This means as soon you enable one of these copy services feature the storage will not respond to ATS locking calls and any ATS-only datastores will become inaccessible. In fact, this is the problem my customer was having in specific they have lost accessibility after the storage team enabled replication on their ATS-Only VMFS5 Datastores that were running on Hitachi HDS USP-VM. It is unfortunate, that Hitachi and other storage vendors don’t point out such incompatibility in their documentation.
I will start with the work around that we have used to solve our problem with the HDS USP-VM not supporting ATS-locking after you enable copy services & in particular replication in our case. As we have formatted our datastore before enabling the replication on the storage side, the storage responded that it support ATS-locking when the datastore created it was provisioned as an ATS-Only datastore. As the storage team enabled the replication afterward, the storage stopped responding to the ATS-locking and the datastore became inaccessible. It took a bit of work to figure out the cause, but after pin pointing the problem, the way we approached the resolution with a minimum downtime is as follow:
- Stop the replication which got our datastore accessible again.
- Create new LUN & Replicate it before creating a VMFS datastore on it
- Create VMFS datastore on this newly created replicated LUN. As this is an already replicated LUN the VMFS datastore created will support SCSI reservation as the storage will report it does not support ATS-locking for this particular LUN.
- Move the VMs of the old datastore to the new one.
- Get rid of the old datastore.
This work around worked, as formatting the LUN with VMFS after the LUN has been replicated created a VMFS datastore that support SCSI reservation locking as that the only locking mechanism the storage reported to support. Basically enable the features that affect ATS-locking before formatting your VMFS datastore and that will create a VMFS 5 datastore that support SCSI reservation, where if you create the VMFS datastore then enable such a feature on your storage you will be asking for trouble as your datastore end up being ATS-Only, where your storage does not support ATS Locking for that particular datastore.
Another solution will be to disable ATS-locking on the datastore. You can find the instruction for that on the following VMware KB: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2030416. Something that the KB fail to mention that you will have to power off all the VMs on that datastore for you to be able to execute the command successfully. Below is the instructions copied from that KB for your convenience, though make sure no VMs are accessing the datastore at the time you run the below commands else it will not execute.
To disable ATSOnly on the VMFS5 datastore, run this command on one of the hosts sharing the VMFS5 datastore:
vmkfstools –configATSOnly 0 /vmfs/devices/disks/device-ID:Partition
device-ID is the NAA ID of the LUN on which the VMFS5 datastore was created.
Partition is the partition number on which VMFS5 datastore was created. This is usually 1.
vmkfstools –configATSOnly 0 /vmfs/devices/disks/naa.6006016055711d00cef95e65664ee011:1
Note: It is sufficient to run this command on one of the hosts sharing the VMFS5 datastore. Other hosts automatically recognize the change.
After disabling ATSOnly, run this command rescan for datastores:
esxcli storage filesystem rescan
The VMFS5 datastore should now mount successfully.
Hope this help explain ATS-Only VMFS Volumes, the pitfall to watch out for and how to fix them/work around them. At last, I want to give a great shout to the team who worked with me on this case: Bow Tie, SAN Man, Guido (Sorry no real names used to protect the innocents)