Azure AD Authentication Failure Stops Users Working
By now, you’ve probably heard about the second large Azure AD authentication outage since September. The March 15 incident calmed down after a few hours, but while it was ongoing users were unable to connect to Microsoft 365 applications when authentication was necessary. It wasn’t a happy experience. Microsoft plans to set a new SLA of 99.99% availability for Azure AD authentication on April 1, 2021. Perhaps they were making a few tweaks to the Azure AD infrastructure to prepare the ground for the upgraded SLA when things went wrong.
The current 99.9% SLA applies to the Azure AD tier for Office 365, but a Microsoft comment posted to the announcement for the new SLA said that the 99.99% level will only apply to those with Azure AD Premium licenses. I guess we shall have to wait and see the details of the SLA when Microsoft publishes the text of the agreement on April 1.
Microsoft 365 applications continued working during the outage unless authentication was necessary. Because they’re built on the Microsoft Graph APIs, the Teams clients authenticate hourly, so they were heavily affected. Outlook desktop stayed online throughout, and users reported varying degrees of useability for other apps.
Working in Word
While the outage progressed, I worked on a Word document for my blog post. All my Word documents are either in SharePoint Online document libraries or OneDrive for Business, so the OneDrive sync client is kept busy. The sync client is responsible for the differential synchronization of files up to the new 250 GB limit. Office apps autosave to capture changes. Not only does autosave ensure that you should never lose much if an app or workstation crashes, it’s also way changes get to other copies of Office documents open for co-authoring. And it’s why SharePoint Online keeps a minimum of 100 versions of documents. If you use the Office desktop apps heavily and store files online, the OneDrive sync client is busy.
OneDrive Sync Client Goes Nuts
Until that is, the OneDrive sync client decides that it should remove all the local copies of files from a SharePoint folder. This was a rather bizarre side effect of the Azure AD outage. At least, although I can’t prove that the outage caused the OneDrive sync client to do something very strange, the problem happened at the same time.
I noticed the issue when File Explorer reported nothing in the local folder which holds the synchronized copies of SharePoint files. The folder usually holds hundreds of files (423 as I write), so something had clearly happened. I opened the OneDrive sync client (build 21.041.0228.0001) and discovered that the client had removed the local files an hour ago (Figure 1), meaning that the client decided to remove the files at around 21:45 UTC, during the period when Microsoft was rolling out remediation for the Azure AD outage.
The problem was easily fixed by going to SharePoint Online and choosing to synchronize the folder again (Figure 2).
The OneDrive sync client started to download local copies immediately (Figure 3) and a full set of documents was soon on my local drive.
Curious and Problematic Synchronization
You can argue that all’s well that ends well, but no good reason exists for the OneDrive sync client to do what it did. Perhaps the Azure AD authentication problem caused the client to believe that it was no longer allowed to download files from the SharePoint site. If so, it would be better if the client issued a warning to say what’s about to happen and offered the user a chance to authenticate with their credentials rather than concluding that everything should be removed now.
Failure to authenticate is the logical root cause which lead to the mass deletion of local files. Every document in the folder has a retention label to stop SharePoint removing documents (set as a default label for the library). The normal course of events is that you can remove a local copy of a file from File Explorer only for the OneDrive sync client to restore the file once it discovers the deletion block imposed by the retention label. Despite the presence of the retention labels, the OneDrive sync client removed all the local files. If my theory holds, the OneDrive sync client concluded that the user had no access to SharePoint Online, so it should remove the local copies as this wouldn’t impact the retained file in SharePoint.
What’s also curious is that just one folder was affected. The OneDrive sync client left everything else alone. My conclusion is that the folder was in active use because I had a Word document stored in that folder open at the time, and autosaved changes were flowing back to SharePoint Online. No need existed for the OneDrive sync client to go near my other folders (like those holding files for the Office 365 for IT Pros eBook), so it left them alone.
It’s not just me who has encountered odd synchronization issues leading to mass removal of files. Fellow MVPs Vasil Michev and Paul Robichaux have also had difficulties. It seems like Microsoft has some work to do to smoothen how the OneDrive sync client handles what could be transient authentication issues.
Maybe I shouldn’t have disabled the new OneDrive sync client file delete warning!
Update March 18: Microsoft has two advisories linked to the problem. SP244708 (SharePoint) and OD244709 (OneDrive). The symptoms experienced by people are different, but the root cause is the same.