Change in the World of eDiscovery
Message center notification MC255066 posted on 7 May covers some configuration changes Microsoft is making to Core eDiscovery and Advanced eDiscovery. After claiming that the changes will bring “performance, reliability and experience improvements” (surely the norm for all Microsoft announcements), the text goes on to note that “documented limits were previously not enforced.” The upshot is that eDiscovery might not work like it did once Microsoft enforces the changes on May 10.
Core eDiscovery and Advanced eDiscovery have different technical foundations. Core eDiscovery is a case-based wrapper around content searches and in-place holds. Each case can span multiple searches and holds, and exports from the searches can be combined into one set for investigators to review. Advanced eDiscovery, which requires Office 365 E5 or Microsoft 365 compliance licenses, has its own search, review, and analysis capabilities developed to handle high-end eDiscovery cases of the type found in large enterprises. These cases might involve millions of documents and email items.
Microsoft has documented limits for content searches for several years. In a practical sense, the most notable changes cited by Microsoft are:
- Content search preview, which can retrieve up to 100 items per mailbox and display a total of 1,000 items from all mailboxes included in a search.
- Core eDiscovery no longer supports exports involving more than 100,000 mailboxes.
- The Advanced eDiscovery collection process now treats SharePoint Online sites as individual locations, so collection might be slower.
Few organizations need to search more than 100,000 mailboxes. Those who do likely use Advanced eDiscovery or a specialized third-party eDiscovery product. MC254890 (May 6) says that Advanced eDiscovery has “raised the maximum size of an export from an Advanced eDiscovery review set from 3 million documents or 100 GB (whichever is smaller) to 5 million documents or 500 GB (whichever is smaller).” Advanced eDiscovery uses multiple ZIP files to handle the export of such large amounts of data.
Advanced eDiscovery is not fast at building its collection of data under review today, so making it a little slower to process SharePoint Online sources is probably not a big issue.
The Meaning of Preview
Some think that a content search preview represents the results of a full search. This isn’t true. Search preview exists to allow investigators to assess the accuracy and effectiveness of search criteria by reviewing a representative sample of what the search might find. No investigator wants to find more information than necessary. A search which finds precisely the required information takes much less time to process than one which returns a bunch of unrelated (and unwanted) items. Being able to preview the items found by a search (Figure 1) gives investigators the chance to see what kind of items a full search will find, no more and no less. An export is preceded by a full search, and that’s when the true set of items found in the target locations is revealed.
All of which means that the limit of 100 items per mailbox is important. It could be that the 101st item is critical for an investigator but doesn’t show up in a preview. Given the increasing amount of data stored in Exchange Online, SharePoint Online, and OneDrive for Business, it’s possible that the most important items are buried and will never appear in a preview. They will be found and included in an export.
More Attention Needed for Search Criteria
This doesn’t mean that the search criteria are flawed or that searching is ineffective. It does mean that investigators need to understand how to use content searches to accomplish their goals. Up to now, it seems like Microsoft didn’t always enforce the documented limits, so it’s possible that preview returned more items than it will after May 10. With that in mind, the real lesson here is that investigators need to pay more attention to search criteria to ensure that the best possible chance exists that important items will turn up in preview.
More Data Consumes Search Resources
Some might ask why Microsoft is making these changes? I think the answer is probably based on multiple influences, including:
- Organizations store more information in SharePoint Online and OneDrive Business. Microsoft is also moving applications like Stream and Whiteboard to use SharePoint Online and OneDrive for Business. The more data, the more resources are consumed by search.
- The 100 GB Exchange Online enterprise mailbox quota encourages users to keep more email.
- The Microsoft 365 substrate captures compliance records for Teams and Yammer and stores the records in Exchange Online user and group mailboxes to make these items indexed and discoverable. The growth of Teams usage over the last year to 145 million daily active users has an attendant growth in compliance record storage. Yammer only recently added support for compliance records, and it only happens for Yammer networks configured in Microsoft 365 mode, so it’s much less of a factor.
In a nutshell, more data than ever before needs to be searched. If you didn’t impose some limits, search would consume more and more resources, and that’s an unmanageable situation. Although I can’t prove it with hard data, content searches certainly seem to take longer to complete now than they once did. Perhaps the changes now being made will restore search performance to where it once was.
Confused about eDiscovery in Microsoft 365? The Office 365 for IT Pros eBook includes a complete chapter on the topic. Subscribe today to keep your knowledge updated as change happens.