Use Dictation to Compose Outlook Messages

Outlook Voice Dictation Supported by Monarch and OWA

Announced in message center notification MC679312 (4 October, 2023, Microsoft 365 roadmap item 171199), the ability to dictate the body text for Outlook messages is now rolling out to all tenants with the intention that Microsoft will complete the deployment in early December 2023.

The title of MC679312 is “Dictation Support Coming to the new Outlook,” which implies that this feature is only for the Monarch client, but message text dictation works for OWA too.

Setting up for Outlook Voice Dictation

The basic idea is that you can turn on a PC microphone when composing a new email and speak instead of writing the message body. Outlook connects to the Microsoft Azure speech-to-text service (hence the need for a “reliable internet connection” to translate words captured by the microphone into text. Transcribing audio to text is well-known within Microsoft 365. It’s the basis for meeting transcription in Teams and video transcripts in Stream.

To begin, make sure that the PC microphone is enabled before creating a new message. When positioned in the message body (voice dictation doesn’t work for the message subject or to select recipients), select the Dictate (blue microphone icon) option and the language you plan to speak in. As Figure 1 shows, Outlook supports a limited set of languages for now with another set in preview. Microsoft Azure speech-to-text can handle “more than 100 languages and variants,” so it’s likely that the set of available languages will expand over time to deal with all languages supported by Outlook.

Outlook voice dictation options
Figure 1: Outlook voice dictation options

I was impressed to find Gaeilge (Irish Gaelic) in the list of preview languages (the list of preview languages is much longer than shown in Figure 1).

Switching languages is easy and it’s possible to compose a message in multiple languages, assuming that you have sufficient fluency in the target languages to create passible text. My efforts in Irish were OK but my French accent proved an obstacle that dictation (or the back-end voice processing service) had difficulty with. In any case, it was fun testing out languages.

Composing Messages with Outlook Voice Dictation

After settling on your preferred language, dictation can start. I found that a slight delay occurred between selecting the Dictation option and a beep indicating that the microphone was ready to accept input. Perhaps this is due to the need to connect to the Azure transcription service.

Once connected, composing message text is a matter of speaking normally. Microsoft says that voice dictation is “a quick and easy way to draft emails, send replies, and capture the tone you’re going for.” I’m not sure that dictation is any faster than typing, especially with the help of intelligent editors, but that applies to people with good typing skills. Those who struggle to compose message text might well find it easier to speak and edit the output before sending the message.

Figure 2 shows a message that I composed with voice dictation. You can see that dictation captured double instances of words twice (easily fixed). The output text is very usable if you don’t mumble or say “Uh” too often.

Outlook voice dictation generates text from speech
Figure 2: Outlook voice dictation generates text from speech

Creating Better Text Output

Microsoft says that Azure transcription has “automatic formatting and punctuation.” Perhaps Outlook doesn’t use this functionality because the text I generated seemed like a real stream of consciousness devoid of punctuation. To have any punctuation, you need to remember to use commands like:

  • Full stop.
  • Comma.
  • New line.
  • New paragraph.

I haven’t yet worked out how to insert a quotation or to bold, or underline text. On the other hand, I discovered that the profanity filter works when I swore at my inability to master dictation.

Outlook voice dictation doesn’t seem to use the Azure speech-to-text disfluency removal feature. This cleans up “stutter, duplicate words, and … filler words like uhm or uh” to produce text that reads better.

Dictation only works when the compose message window is active. If you move focus to another application, like switching to a document to check a fact, the connection to Azure drops and dictation stops. The connection also drops if you pause and don’t speak for more than ten seconds (approximately). I can understand why voice dictation works like this. It would be wasteful to persist a connection while waiting for the user to return and produce some more pearls of wisdom. However, it’s something to remember as no one likes to speak into a message without generating text.

Fixing Dictated Text is a Copilot Thing

Being able to rewrite and improve text is one of the benefits advanced for generative AI. I asked Bing Chat Enterprise (BCE, soon to be plain “Copilot”) to add the missing punctation from text generated from speech and then make the text more concise (you could equally use ChatGPT or Bing Chat to do the job). The output was very good and it’s easier to do this than rewriting the raw text. Interacting with BCE required me to copy text to BCE, run the prompt, and paste the amended text (Figure 3) back into the Outlook message.

Using Copilot to refine text generated by Outlook Voice Dictation
Figure 3: Using Copilot to refine text generated by Outlook Voice Dictation

Using an external generative AI is slightly clunky, but it works and is a lot cheaper than paying $30/month for the fully-integrated Microsoft 365 Copilot. Admittedly, Microsoft 365 Copilot offers many more features and functions and no one would ever buy it simply to improve text. Or would they?


Insight like this doesn’t come easily. You’ve got to know the technology and understand how to look behind the scenes. Benefit from the knowledge and experience of the Office 365 for IT Pros team by subscribing to the best eBook covering Office 365 and the wider Microsoft 365 ecosystem.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.