Seeking Better Script Performance
Using the Where-Object cmdlet to select items from a set is a common operation for any Office 365 administrator who uses PowerShell to manage applications like Exchange Online, SharePoint Online, and Teams. From a performance perspective, it’s better to use a filter with the original call to find objects (like finding the Microsoft 365 Groups enabled for Teams or Yammer) because that avoids the need to extract a subset and a server-side filter might be available, but sometimes that’s not possible.
Those of you who use our Graph-based script to report usage across multiple Office 365 workloads know that some recent work has been done to improve performance. This provoked a comment that replacing Where-Object with the Where method would deliver further improvement in instances where the script filters objects because it avoids the need to pipe objects to Where-Object. The Where method only works with arrays. For example, let’s assume that you load a bunch of mailboxes with a call like:
$Mbx = Get-ExoMailbox -ResultSize Unlimited
To filter the mailboxes with Where-Object and find any mailbox with the string “Tony” in the display name, you might do something like:
$Mbx | ? {$_.DisplayName -Like “*Tony*”}
The same can be done with the Where method by changing the code slightly:
$Mbx.Where({$_.DisplayName -Like “*Tony*”})
(In these examples, the ? shortcut is used instead of spelling out the full Where-Object cmdlet name). Note that if no objects are found by the filter, Where-Object returns $Null while Where returns an empty array. Read this article for more information about the ups and downs of the Where method.
Tests Prove Better Performance
Some initial tests proved that substituting the Where method is easy and delivered an improvement of between 12% and 15% in performance. We therefore went ahead and made the change in V1.3 of the GetGraphUserStatisticsReport.PS1 script, which is available from GitHub.
My original tests were done with a few thousand records. I was curious if the performance improvement was maintained with larger data sets. I therefore downloaded 75,000 Office 365 audit records and stored the items in an array, a data table, and a PowerShell list. In all cases, the Where method is faster.
Here’s an example of using the Where-Object cmdlet to extract a subset of records from a list. Six records are returned from 75,000 in 98 seconds.
Measure-command {$LabelRecs = $Report| ? {$_.Operation -eq "Get-Label"}} Days : 0 Hours : 0 Minutes : 1 Seconds : 37 Milliseconds : 959 Ticks : 979599545 TotalDays : 0.00113379576967593 TotalHours : 0.0272110984722222
Changing the code to use the Where method is very simple because the same script block is used, passed in parenthesis. The code returns the same six records in 78.5 seconds.
Measure-command {$LabelRecs = $Report.Where({$_.Operation -eq "Get-Label"})} Days : 0 Hours : 0 Minutes : 1 Seconds : 18 Milliseconds : 540 Ticks : 785406751 TotalDays : 0.000909035591435185 TotalHours : 0.0218168541944444 TotalMinutes : 1.30901125166667 TotalSeconds : 78.5406751 TotalMilliseconds : 78540.6751
Simple Change from Where-Object Cmdlet
It’s hard to see why you should not consider upgrading Where-Object calls to use the Where method anywhere the output is not piped to another cmdlet. The switchover is easy and the performance gains are obvious. The next time you’re editing a script, consider using this approach. It won’t make slow cmdlets like Get-UnifiedGroup any faster, but it might quicken the surrounding processing.
Worrying about getting an extra 15% performance for PowerShell scripts might seem a strange thing for the Office 365 for IT Pros eBook team to worry about, but we write so much code to illustrate principles and give examples that we care about this kind of thing!
The difference is even more, if you have more data to parse. I tried with with a larger directory structure
$filelist= Get-ChildItem C:\ -Recurse
$filelist.count
677439
(measure-command {$filelist | ? {$_.name -like “pdf”}}).totalseconds
19,1507168
(measure-command {$filelist.Where({$_.name -like “pdf”})}).totalseconds
3,6654204