Making Retention More Efficient
Message center notification MC288633 (1 October) covers the topic of optimized behavior of file versions preserved in SharePoint Online and OneDrive for Business. It’s a title guaranteed to turn off most Office 365 administrators unless they’re interested in compliance. As it happens, I am, so I read the notification.
My reading of the situation is that Microsoft is replacing an old-fashioned implementation of the preservation hold library with a more modern approach. As you might know, the preservation hold library is the location used by SharePoint Online to keep information needed for retention purposes. It’s the equivalent of Exchange Online’s Recoverable Items structure, a place where updated and removed content stays until the retention period expires.
The Preservation Hold Library
Up to now, SharePoint Online has used the preservation hold library to retain multiple versions of changes made to documents and list items. If someone edits a document which comes within the scope of a retention policy, SharePoint captures a pre-change copy of the document in the library. If someone deletes a document that must be retained, it goes into the preservation hold library. The actual processing is more complicated, but that description is sufficient here.
The net effect is that a preservation hold library for a busy site can accumulate a bunch of items (Figure 1). Although users cannot access the preservation hold library, its content is indexed and discoverable and available for searching, which means that eDiscovery investigators can recover the full change record for documents and list items. Administrators can also recover files from the preservation hold library, so there’s lots of goodness available.
The Downsides of Retention
Except that a downside exists. Or rather, two significant downsides. The first is that capturing edits and deletions for a busy SharePoint Online site can consume a large percentage of the storage quota used for the site. The amount differs from site to site depending on the characteristics of site usage and the type of file stored. For instance, the site which I use to store the Word documents for blog posts has thousands of relatively small files (usually in the range of 1-5 pages), most of which are never edited after publication. The preservation hold library for the site holds 924 items of 292.6 MB, or 5.92% of the site storage.
The site used for the Office 365 for IT Pros book has completely different characteristics. The Word documents (and some Excel spreadsheets) are larger (some chapters are over 100 pages) and they receive frequent revisions. For example, according to its version history, the chapter covering Teams architecture and structure in the 2021 edition has 330 versions, most generated using the Office AutoSave feature. The combination of large files and multiple revisions drives storage consumption to 15.3 GB, or 21.8% of the site (Figure 2).
The problem is that SharePoint Online regards the storage consumed by the preservation hold library in the same manner as it treats other libraries. Everything counts against the tenant’s overall SharePoint storage quota, which seems a little unfair given that Exchange Online provides additional free storage per mailbox to handle retention. It’s easy to run a report to find the storage consumed by each site, but you’ll need to access the site to discover how much is consumed by the preservation hold library.
The second issue is that content searches find multiple copies of files stored in SharePoint Online sites. This might be what you want, but usually it’s confusing (Figure 3).
The change rolling out in mid-Novembers means that files with multiple versions deleted from a SharePoint Online site or OneDrive for Business account which must be retained will be preserved as a single file instead of multiple versions. Storing fewer versions should reduce the demand for storage, but I shall wait and see how things work before making a definitive statement on that point. Reducing the number of versions held for a file will also speed up deletions and eliminate errors caused when retained files had more than a hundred versions in the preservation hold library.
Existing files in the preservation hold library are not updated and behave as before. Eventually, after the retention period for items expire, the weekly background job to check and remove obsolete material from the preservation hold library will remove the older files and release storage.
The new approach applies to any file which ends up in the preservation hold library because of a retention policy or in-place eDiscovery hold.
Given the number of files now stored in SharePoint Online due to increased use by apps like Teams, the effect of AutoSave in generating multiple file versions, and the impact on tenant storage quota that retention can have, this is a good change. It also simplifies administration and might even make it easier for backup and restore scenarios (fewer files to deal with). Time will tell!
Learn more about how Office 365 really works on an ongoing basis by subscribing to the Office 365 for IT Pros eBook. Our monthly updates keep subscribers informed about what’s important across the Office 365 ecosystem.