Virtual Machines residing on NFS storage become unresponsive during a snapshot removal operation

Article ID: VMW0002 Using HotAdd mode with proxy ESXi server, Virtual Machines (VM) may stop responding during Simpana’s snapshot consolidation post backup process.

Symptom

Using Virtual Server Agent’s HotAdd mode with proxy ESXi server on VM’s residing on NFS storage, target VMs become unresponsive for 30 seconds and removing snapshots take a long time.  This may cause VMs to stop working or failover.

Cause

Known VMware issue which occurs with VMs on NFS storage when the target virtual machine and the backup appliance reside on two different hosts.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2010953

From VMware - The issue is with NFSv3 locking mechanism and how it is implemented in our (VMware) products. To fix the problem, our Engineers would have to write around the limitations of the base NFSv3 protocol, which isn't trivial and since NFSv4 solves all the problems with locking, they are going to wait until that protocol is implemented in our code.

NFSv4 support has not yet been implemented as of vSphere 5.5

Resolution

  1. Put Virtual Server Agent on a Virtual Machine on each ESXi host hosting target VMs. The issue doesn't occur if the VSA is on a Virtual Machine on the same host where the hot-add operation is performed.
     
  2. Switch over to iSCSI as native VMFS is not exposed to this issue.

 

For Best Practices using NFS datastores:

 

http://cormachogan.com/2012/11/26/nfs-best-practices-part-1-networking/

http://cormachogan.com/2012/11/27/nfs-best-practices-part-2-advanced-settings/