Backup jobs using Disk Library with iBRIX CIFS/NFS mount paths fail

Article ID: MA0009 Disk Library using iBRIX CIFS/NFS mount paths becomes unresponsive and backup jobs fail.

Symptom

Backup jobs process normally for period of time then fail.  In the Job Manager Log you may see entries similar to:

24783 99e20700 [Date/Time] 1982479 SdtTail::onProcDataCompleted: Client [ ], Id [ ], Cannot process the SDT buffer. Error [-1] RCId [ ]

24783 99e20700 [Date/Time] 1982479 SdtBase::setLastErr: Setting last err [ ][The destination encountered an error while processing the data from the source.] RCId [ ]

Or

2340 988 [Date/Time] 1463785 Scheduler Set pending cause [Current backup is Deduplicated. It is prevented from multiplexing on to same media with other Jobs.]::Client [ ] Application [CVD] Message Id [ ] RCID [0] ReservationId [0]. Level [0] flags [0] id [0] overwrite [0] append [0] CustId[0].

2340 1080 [Date/Time] 1463785 Scheduler Phase [Failed] message received from [ ] Module [clBackup] Token [ ] restartPhase [0]

Cause

Problem related to accessing CIFS/NFS mount paths on iBRIX X9000 storage.  Due to some extensive caching of “walk” requests, the entire CIFS/NFS service may become very unresponsive as it tries to process more and more requests. If this happens during backup window, the non-responsiveness of the CIFS/NFS service will cause the MediaAgent to time out when trying to read/write/access any files on the mount path. HP has acknowledged this issue and will be addressing it in StoreAll V6.5.

For more information see Allocation Policy in HP StoreAll Storage Best Practices

Resolution

Contact HP iBRIX vendor for latest update/status on this issue. In the interim, there are some steps you can take with CommVault to minimize the possibility of this issue from happening during backups.

  1. In the CommCell Console, open the Control Panel and click on the Media Management Configuration applet.
  2. On the Service Configuration tab, make the following changes:
  • Reduce Number of volumes for size update value down to 200 (default = 1000)
  • Increase Interval between volume size update requests value up to 240 (default = 120)
  • Increase Interval (Minutes) between disk space updates value to 60 (default = 30)
  1. In the CommCell Console, expand Storage Resources | Libraries.
  2. For each Disk Library using CIFS/NFS mount paths on iBRIX X9000 storage, select each Mount Path for the associated disk library and do the following:
    1. Right-click on the Mount Path and click on Properties.
    2. In the Allocation Policy tab reduce the value for Allocate number of Writers.  (default=5)

If using Commvault Software Version 10:

  1. In the CommCell Console, open in turn each MediaAgent Property dialog box with access to the associated disk library and do the following:
    1. Right-click the appropriate MediaAgent, and then click Properties.
    2. Click the Additional Settings tab.
    3. Click Add.

          The Add Additional Settings dialog box appears.

  1. In the Name box, type DMDontUpdateVolumeSizeDuringBackup.

          The Category and Type details fill automatically.

  1. In the Value box, type 1.
  2. Click OK to save and exit the Add Additional Settings dialog box
  3. Click OK to save changes to the Client properties

 For all Commvault Software versions, after making the changes above:

  1. Restart CIFS/NFS service on iBRIX

In addition to the above, you can try changing backup schedules to reduce spikes in CIFS/NFS loading during backups as well as look to reducing number of job streams that are run to the disk library during the backup window.

Currently the downside of making the above Commvault changes would be incorrect reporting of size on disk against DDB’s since we may not be able to calculate all volume folders in a timely manner.  Backup jobs and capacity reporting should still be accurate since they do not rely on the size of the volume folders but rather the reported size of each archive file at the end of a backup stream.