Report Office 365 User Activity Using the Microsoft Graph

Table of Contents

Gathering Data for Multiple Workloads to Understand User Activity

For the last few months, I have been dabbling with a PowerShell script to extract and report usage data for multiple Office 365 workloads from the Microsoft Graph. The idea is that an Office 365 user activity report generated by fetching activity data from all the workloads reported in the Graph helps administrators to figure out if accounts are in use and if so, what they are used for. If an account isn’t in use, then you might remove it and save some licenses.

One of the joys of PowerShell is how quickly you can put a solution together. The corollary is sometimes that the solution isn’t as efficient as it could be, which often happens when you’re not a professional programmer. When I write a script, the most important thing is often to illustrate a principle and show how something works. When PowerShell scripts are deployed into production, they’re usually upgraded and improved by programmers to meet organizational standards and fit in with other scripts used to manage the infrastructure. For this reason, I don’t bother too much with tweaking for performance.

This script is different. It’s been picked up by several tenants who reported that the script works but it’s slow when asked to process data for thousands of accounts. This deserved some investigation which produced some improvements, such as using PowerShell’s Where method to filter data.

PowerShell Hash Tables

But PowerShell is not a database and storing data about account usage in PowerShell list objects only scales so far. There are many web articles covering PowerShell performance with large amounts of data, many of which point to using hash tables because they are very efficient for finding and retrieving data (see this article about how to use hash tables).

A hash table is a collection of key/value pairs. The keys are unique, and the values are often some information associated with the key. For instance, because Office 365 objects like groups and sites store sensitivity labels as GUIDs, I often create a hash table composed of the GUID (key) and label display name (value) which I can then use to interpret the GUIDs stored in objects. Here’s what the code looks like:

$Labels = Get-Label # Get set of current labels
$HashLabels = @{} # Create hash table
$Labels.ForEach( { # Populate the hash table with the GUID and display name of each label
       $HashLabels.Add([String]$_.ImmutableId, $_.DisplayName) } )

Anytime I need to find the display name of a label, I can do something like this:

$GUID = (Get-UnifiedGroup -Identity “Office 365 for IT Pros”).SensitivityLabel.GUID
Write-Host “Display name of label is” $HashLabels[$GUID]
Display name of label is Limited Access

Apart from their usefulness in situations like described above, hash tables are very fast when you use keyed access. Speed being of the essence when thousands of records are to be processed, I decided to investigate if hash tables could replace the list objects used by the script.

Keys and Values

Finding a key is no problem because the user principal name is unique for each account. Figuring out how to store all the data in the hash table value was another matter. That is, until I noticed that: ”the keys and values in a hash table can have any .NET object type…” In other words, you’re not limited to storing simple values in a hash table.

When the script extracts usage data for a workload (like Teams or Exchange) from the Graph, it processes each record to create a list of accounts and their usage data for that workload. After some experimentation, I was able to populate the hash table by:

Creating an array of the usage data for the workload for an account.
Appending the array to the existing usage data extracted from other workloads for the account (as stored in the hash table).
Writing the updated array back into the hash table.

This might be inelegant, but it works. After all workloads are processed, the result is a hash table keyed on the user principal name with a value composed of an array containing the usage data for all workloads for that user. Access to the data is via the user principal name. For example:

$datatable["Kim.Akers@Office365itpros.com"]

TeamsUPN             : Kim.Akers@office365itpros.com
TeamsLastActive      : 05-Sep-2020
TeamsDaysSinceActive : 5
TeamsReportDate      : 07-Sep-2020
TeamsLicense         : POWER BI (FREE)+ENTERPRISE MOBILITY + SECURITY E5+OFFICE 365 E5 WITHOUT
                       AUDIO CONFERENCING
TeamsChannelChats    : 7
TeamsPrivateChats    : 10
TeamsCalls           : 0
TeamsMeetings        : 5
TeamsRecordType      : Teams

ExoUPN             : Kim.Akers@office365itpros.com
ExoDisplayName     : Kim Akers
ExoLastActive      : 20-Aug-2020
ExoDaysSinceActive : 21
ExoReportDate      : 08-Sep-2020
ExoSendCount       : 8
ExoReadCount       : 19
ExoReceiveCount    : 392
ExoIsDeleted       : False
ExoRecordType      : Exchange Activity

The display is truncated here to show two of the six workload usage data extracted for an account.

Creating the report is then a matter of processing each account to extract the information and format the data. To do string comparisons and other calculations, I found that it was necessary to use the Out-String cmdlet to make the properties taken from the array into trimmed strings. It might be something to do with the way that the hash table values are stitched together from multiple arrays.

Faster Performance

After changing to hash tables, I observed a 70% performance gain in script execution time in my (small) tenant. I expect a much better gain in larger tenants where the advantages of hash table access become more pronounced. This feeling was realized in a test against 20K accounts which proved that the script is now capable of processing at circa 1,000 accounts per minute (Figure 1).

Update September 18: I received a note saying that the script processed 26,808 accounts at the rate of 3184.71 per minute!

The time required to fetch data from the Graph is the same as previous versions as is the time to prepare data for processing. All the improvement is in the report generation, which is where the hash tables excel. The tenant who processed the script against 20,000 accounts used the Office 365 user activity report (example shown in Figure 2) to identify 70 accounts assigned Office 365 E5 licenses that can now be reallocated or released (a potential saving of $29,400 annually).

Office 365 user activity report
Microsoft 365 user activity report — Figure 2: Reviewing account usage to locate underused Office 365 licenses

The Office 365 user activity report script is available from GitHub. If you have a suggestion for improving the performance further, please let comment on GitHub.

OK, we should be writing text for the Office 365 for IT Pros eBook instead of trying to work out how to speed up PowerShell scripts. But you learn a lot about an infrastructure when you program against it, so we’ll keep on scripting…