NFS datastore becomes inaccessible after mounting the snapshot

Article ID: SS0004 ESX Server mounted data store goes to inactive state causing file copy failure.

Symptom

Successfully mounted datastore becomes inaccessible when the browse operation is performed or data is accessed.

From ESX \var\log\kernel.log you may see the entry:


Could not open device 'nfs:[device]' for volume open: No underlying device for major,minor

From VSBKP.log you may see the entry:


CVMWareInfo::_MountVM_SNAP() - CopyDatastoreFile() failed..Hard Failure!! [ ] 

Cause

This issue can be caused by one or more mismatched MTU (Maximum Transmission Unit) settings between ESX proxy host and NFS storage. Usual reason for the mismatch is one or more (but not all) devices in the path are set to use Jumbo Frames (MTU of 9000, normal MTU is 1500). A single MTU mismatch in a data path may still function but could cause orphan data mounts. Two MTU mismatches in a data path can cause complete data mount failure. Example of mismatched MTUs in a data path:

ESX virtual switch:     MTU: 1500 
ESX NIC:                    MTU: 9000 
Switch port:                MTU: 1500
Core Switch1:             MTU: 9000
Core Switch2:             MTU: 9000
NetApp:                      MTU: 9000

Resolution

Use ping command to confirm if the issue is occurring due to incorrect MTU settings, 

Use the ping command with -f “do not fragment” option and -l "packet size switch" to test Jumbo package MTU values for the path.

For example:  From the Virtual Machine ping the Data Store device

Ping [IP Address] -f -l 8972 (9000 minus 28 byte overhead)

If the ping responds with a timeout, that means a downstream router has a mismatched MTU.

Consult each device's documentation on how to view/change the MTU.