Copilot for Word Reference Documents Can be too Large

Table of Contents

Copilot for Word Reference Documents Can be Too Large to Process

I’m happily using Copilot for Word to generate, refine, and summarize text when I run into an issue that afflicts all AI technologies based on large language models (LLMs): the prompts generated for the LLM to process support a limited number of characters. I can’t say precisely what that limit is because I can’t find any documentation for the issue, but I can say that incorporating a large reference document into a prompt causes Copilot some difficulty.

Take the prompt shown in Figure 1. As a reference document, I added a 518 KB 27-page Word document which happens to be the first chapter of the Office 365 for IT Pros eBook. I asked Copilot to use the information to help it generate a brief overview of the value Office 365 brings to customers.

Adding a reference document to a Copilot for Word prompt.

Copilot for Word reference document. — Figure 1: Adding a reference document to a Copilot for Word prompt

Copilot worked away and began to generate text. After several seconds, the output was ready but came with the caveat that Copilot couldn’t process the reference document fully (Figure 2). The output generated by Copilot is “based only on the first part of those files.” In some cases, this might not make a difference, but the latter half of the reference document contained information that I thought Copilot should include.

Copilot for Word reports a reference document is too long. — Figure 2: Copilot for Word reports a reference document is too long

The question is why can’t Copilot use the full content of large reference documents. Here’s what I think is happening.

Grounding and Retrieval Augmented Generation

Copilot for Word uses reference documents to help ground the prompt entered by the user with additional context. In other words, the content of the reference document help Copilot understand what the user wants. Copilot uses a technique called Retrieval Augmented Generation (RAG). According to an interesting Microsoft article about grounding LLMs, “RAG is a process for retrieving information relevant to a task, providing it to the language model along with a prompt, and relying on the model to use this specific information when responding.”

Limits exist in grounding large language models. Copilot allows users to include a maximum of 2,000 characters in their prompts. Copilot adds content extracted from the reference documents and other information found in the semantic index to the prompt to provide the context for the LLM to process. The semantic index holds information about documents available to the user stored in SharePoint Online or OneDrive for Business or ingested via a Graph Connector. The maximum size of a prompt must cover whatever the user enters plus the information extracted from reference documents during grounding.

I have very large Word documents of well over 1,000 pages, but it would be unreasonable to tell Copilot to use these files to ground prompts. There’s too much content covering too many varying topics for Copilot to make much sense of such beasts.

Good Copilot for Word Reference Documents

A good reference document is one whose content is adjacent to the topic you ask Copilot to generate text about. Ideally, the document is well structured by being divided into clear sections that cover different points. A human should be able to scan the document quickly and tell you what it’s about. My tests indicate that Copilot for Word generates the best results when reference documents are structured, contain material pertinent to the prompt, and are less than 10 pages. Your mileage might vary.

Although chapter 1 of the Office 365 for IT Pros eBook is packed full of useful and pertinent information, it’s just too much for Copilot to consider when attempting to respond to the user prompt. Copilot would be much happier if I provided it with a five-page overview of Office 365.

Other Copilots Have Limits Too

Encountering difficulties using long reference documents is similar to the limit that exists when Copilot for Outlook attempts to summarize a long email thread. According to the support article covering the topic, “In the case of a very long thread, not all messages may be used, as there are limitations of how much can be passed into the LLMs.”

Copilot for GitHub also has limits, as attested in many questions developers ask about its use (here’s an example).

In other Copilots, the type of information being processed might reduce the possibility that Copilot might run into issues. For instance, when Copilot for Teams summarizes the discussion from a meeting, it uses the meeting transcription as its basis. Even a very long meeting is unlikely to trouble Copilot too much because (assuming the meeting has an agenda), the discussion flows from point to point and has a reasonable structure.

Preparing for Copilot

All of which brings me back to a central point about preparing for a Copilot for Microsoft 365 deployment. You can deploy all the software you want, including the tools available in Syntex (soon to be SharePoint Premium) to prepare content and Microsoft Purview to protect content. But at the end of the day, Copilot will be asked to process documents created by human beings. Whether those documents make good reference documents remains to be seen.

It’s a hard nut to crack. Humans never wrote documents to be processed by AI. They created documents to meet goals, explain projects, lay out solutions, and so on. Sometimes the documents are well-structured and easily navigated. Other times they’re a challenge for even their authors to interpret, especially as time goes by. Some documents remain accurate even after years and some are outdated in the weeks following publication. It will be interesting to see how Copilot copes with the flaws and imperfections of human output.

Insight like this doesn’t come easily. You’ve got to know the technology and understand how to look behind the scenes. Benefit from the knowledge and experience of the Office 365 for IT Pros team by subscribing to the best eBook covering Office 365 and the wider Microsoft 365 ecosystem.

Don’t Feed Large Reference Documents to Copilot for Word