Restrict Processing to Specific File Types Using Auto-Label Retention Policies
A reader asks if it’s possible to limit the processing of a Microsoft 365 retention policy to certain file types. In this case, they wish to remove any Word documents, Excel spreadsheets, and PDF files from the SharePoint Online sites used by Teams while leaving other items stored in SharePoint like the OneNote notebook and .mht files used by the Teams wiki.
Before answering the question, it’s important to understand that Teams retention policies only process messages from channel conversations (including private channels) and chat. If you want a retention policy to process the SharePoint content owned by a team, you need the policy to process Microsoft 365 Groups. This is because Teams uses Microsoft 365 Groups for its identity and membership service.
The challenge then is how to amend a retention policy for Microsoft 365 Groups to only process certain file types. However, Microsoft 365 retention policies operate on a “whole container” basis. In other words, they apply the same retention settings to all items found in the target locations. In this case, Microsoft 365 Groups. Although you can refine a policy to process selected groups, the GUI in the Microsoft 365 compliance center doesn’t allow you to modify a policy to perform selective processing by file type.
Assigning Content Queries with PowerShell
PowerShell cmdlets are available to manage Microsoft 365 standard and auto-label retention policies. To access these cmdlets, connect to the Exchange Online management endpoint and then run the Connect-IPPSSession cmdlet to connect to the compliance endpoint. You can then run the Get-RetentionCompliancePolicy cmdlet to view the set of retention policies defined in the tenant and the Get-RetentionComplianceRule to view the rule defined for each policy. Each retention policy divides into the general policy settings and the rule dictating the retention action performed by the policy.
If you look at retention policy rules, you’ll see a parameter called ContentMatchQuery. Checking Microsoft’s documentation for Set-RetentionComplianceRule, we discover that this parameter contains a content query in Keyword Query Language (KQL) to filter the content scanned by the policy. A KQL-based content query is a commonplace object within Microsoft 365 as it’s used for SharePoint search, eDiscovery content searches, and automatic labeling policies.
It is possible to update a standard retention policy to add a content query. However, because no GUI exists for this purpose, you’ve got to add the content query with PowerShell. For example, the content query added to the target retention policy finds any Word document, Excel worksheet, or PDF file:
Set-RetentionComplianceRule -Identity "Retention for Specific File Types" -ContentMatchQuery "filetype:doc* filetype:xls* filetype:pdf"
You can test a content query by inputting it to SharePoint Search. If the search returns the expected files, you know the filter is good and will work with a retention policy.
The Problem with Standard Retention Policies
Or can it? Well, the filter works, and the retention policy will process only the specified files, but Microsoft doesn’t support this scenario. At least, they don’t include this scenario in their testing as they introduce new functionality (like adaptive scopes), which means that something might break inadvertently. That’s not a good situation for a compliance solution.
The ContentMatchQuery parameter exists because a KQL content filter is a way to select items for auto-label retention policies. Examples of the use of KQL queries in auto-label retention policies are to apply retention labels to Teams meeting recordings or to files with specific sensitivity labels.
In the earliest days of Office 365 retention policies, as they were then, you could attach a content query to a standard retention policy, but this has not been a supported scenario since Microsoft removed the GUI to enter KQL queries for retention policies. However, Microsoft left the ability to define a content query for a standard retention policy through PowerShell intact, and that’s what has created the inconsistency which exists today. To address the issue, Microsoft should update the documentation for Set-RetentionComplianceRule to say that the ContentMatchQuery parameter is unsupported with standard retention policies.
Use Auto-Label Policies
The approved solution is an auto-label retention policy. These policies accept KQL content queries like the example cited above. When SharePoint Online applies a retention policy with a content query, it finds matching files and assigns the retention label defined in the policy to the items. The retention period and action defined in the retention label controls what happens to the files. For example, the label might define a retention period of one year after which SharePoint Online should remove the labeled files.
One advantage of using an auto-label retention policy is that users see the retention label applied to matching files and can change the label if necessary. This doesn’t happen with standard retention policies because all processing happens in the background and users won’t necessarily be aware of the deletion of files.
The downside of auto-label retention policies is that any account which comes within the scope of the policy (for instance, all members of the Microsoft 365 groups processed by a policy) require Office 365 E5 or Microsoft 365 compliance licenses. This is not a problem if your organization already has these licenses, but there’s quite a step up from Office 365 E3 (needed for standard retention policies) to automatic processing, even before Microsoft’s March 1 price increase.
The bottom line is that you can live on the edge and use content queries with standard retention policies until something breaks (or Microsoft disables the capabilities) or use auto-label retention policies for selective processing based on file types. Going with a supported approach is always a better choice, even if it might cost some extra license fees.
Insight like this doesn’t come easily. You’ve got to know the technology and understand how to look behind the scenes. Benefit from the knowledge and experience of the Office 365 for IT Pros team by subscribing to the best eBook covering Office 365 and the wider Microsoft 365 ecosystem.