Slow scan during file system backup with multiple large volumes

Article ID: SCAN0002 Scan process scans multiple volumes in serial manner.

Symptom

File System agent subclient backup takes a long time to complete scan of multiple volumes.

Cause

Regardless of the number of readers configured for a subclient, the scan is a single process.  The scan process walks the entire directory table for each volume sequentially.  As such, the content of a subclient containing multiple volumes with a large number of files can take a long time to scan.

Resolution

If possible, use Optimized Scan method.  Optimized Scan uses a customized database containing information about files on a volume.  The database is maintained by background processes and can be quickly queried to determine which files to back up.  Note that the first backup scan may take just as long as before while initializing the database. Subsequent backup scans will be a lot faster.

 

To improve the overall scan time further, split the volumes into multiple subclients. Running multiple subclients in parallel will yield parallel scan processes which should complete the scan quicker. Splitting content along volume boundaries separates disk I/O demands of each subclient’s scan and backup processes.  However, if the volumes contain a greatly unequal number of files, better performance may be achieved by balancing the number of files protected/scanned by each subclient.

 

For example, if you have a subclient with four volumes, one of the volumes with 50K files and the other three volumes with 10K files each, you can split the volumes into two subclients; one subclient with the volume containing 50K files and the other subclient with the remaining three volumes containing 10K files each.  

 

Note that using multiple subclients increases resource requirements on the data source, transit media, and destination library. Make sure these path components can support parallel subclient processing.