Article ID: 54284
Article Type: Best Practices
Last Modified:
Infinite retention is not part of any good information management plan. Infinite retention can lead to rampant storage growth, increased litigation risk, and expensive and inefficient discovery processes. The only possible exception for infinite retention is archival data. Even then, infinite retention of archival data should be highly selective requiring reoccurring review and justification. Justification for infinite retention must validate the costs of retaining the data. Government, fiscal, and legal requirements for data retention always have boundaries. That leaves the only legitimate reason for infinite retention.as historical retention for posterity purposes. Really small business case here.
If infinite retention is justified, the decision to keep the data in deduplicated form on disk needs to be carefully considered. Deduplication of archived data on disk may slow rampant storage growth, but it does not stop growth and nor does it resolve any of the other pitfalls of infinite retention. Perpetually adding more disk storage is not practical and will always have cost limits. Eventually, moving archive data off to SILO tape storage becomes the only practical option.
All that being said, the posed question of what would be the best practices for Deduplication Database (DDB) maintenance and sealing of the DDB remains.
Since the DDB is not involved in restore, I believe the primary focus of an administrator - tasked with managing infinitely retained data in a deduplicated disk library - should be data verification and capacity management.
By DDB maintenance, we are referring to the resynchronization of the DDB. This should only be necessary when bringing a DDB back online from maintenance mode. And maintenance mode usually occurs only after a recovery of the CommServe database. Therefore, best practices for DDB maintenance of infinite data versus other data tracked by a DDB would not be any different. Resynchronize the DDB if and when necessary to bring a DDB back online in a reusable state.
Sealing a DDB would only be necessary when the capacity of the disk library is reached. For other reasons, you may want to do it sooner. Such as, If you make the practical assumption that use of SILO copies is inevitable, then an earlier boundary point must be established to seal the SILO copy and free up disk space. The best practice of when to seal the DDB should be based on disk library capacity, expected restore size from SILO copies (i.e.. free disk space needed to restore one or more SILO copy to recover the data), and restore time requirements. At a minimum you would keep sufficient free disk space on the library to restore one SILO copy.
As an administrator of infinitely retained archived data, I would first make sure I had a secondary copy of my archived data on another library. I would then set up a SILO copy for the inevitable disk space management event. I would then schedule periodic Data Verification jobs to ensure restore-ability of my data. And should I find infinite retention impractical (which I will), I would make sure my boundaries for sealing the DDB are granular enough to allow practical restore/aging/pruning of a range of archived data as a single unit.