Why is pruning of Deduplicated data important?

Article ID: DA0004 Why is pruning of Deduplicated data important?


Why is pruning of Deduplicated data important?


Data pruning is important.  The primary reason for regular data pruning is to free up media space for new data. But the pruning process has a lower priority for resources than backup or auxiliary copy jobs.  This often results in data pruning being preempted or given limited resources. For deduplicated data which deletes disk library data at the object level, this can cause a large backlog of objects waiting to be pruned. The potential negative impact is that storage media can end up running out of space. 

When a deduplication store is new, there is very little pruning required.  As the deduplication store becomes larger, more data needs to be pruned.  Once you get behind in pruning deduplicated data it is hard to catch up.  To tackle this problem there are two things you should do.

First, add the Additional Setting SIDBMinPruneRequests to each data mover MediaAgent.  The setting increases the minimum thread count used for the pruning process. The setting is an integer (DWORD) value of the MediaAgent category.  The recommended maximum value is 5.  The maximum possible value is 8, but if you set the value to 8 then no backup can run. 

Second, set an operational window on each Media Agent to allow data pruning only during times that do not conflict with Full and Non-Full Data Management (backup) jobs.  Running backups while pruning with a higher thread count can slow backups. If running the pruning process while other jobs access storage media is unavoidable - and these jobs are noticeably slower - you should reduce the value for SIDBMinPruneRequests until both job performance requirements and pruning needs are met.

Both steps of increasing minimum thread count and setting operational window for pruning should to be taken. Data pruning is essential to both CommServ database and media space management.