Are There Reliable Speech-to-Text Options For Writers?

consumer using speech to text

For all the power of modern personal computers, speech-to-text has been a difficult service to perfect. Has its time finally come?

Sometimes technology evolves in bursts; sometimes the progress cycles are much slower and more deliberate. Reliable speech-to-text has been one of the most anticipated advances in computing, but also one of the slowest to get ready for prime time.

The first personal computers didn’t actually do much computing. The Kenback-1, recognized as the first commercial personal computer by the Boston Computer Museum, was the brainchild of John Blankenbaker.

Born in a garage in 1971 — just like so many other life-changing innovations — the Kenback-1 was sold as a learning tool for $750. It allowed you to use three programming registers, five addressing modes, and 256 bytes of memory to experiment with rudimentary programming concepts.

Most people couldn’t imagine what value there could ever be in having a personal computer.

We’ve come so far… but

Over the ensuing 50+ years, computers can now do pretty much anything you can think of, though one of the more elusive pursuits has been reliably performing speech-to-text tasks.

Language is a prickly beast, subject to strange idioms, endless personal names and product brands, quirky cross references to phrases from other languages, variations in local dialects, jargon common to specific industries, and grammar rules that can challenge the most skilled logician.

For years, I’ve waited for a product that didn’t leave me gnashing my teeth and spending more time correcting the generated text than it would have taken me to write it long-hand with a number-two pencil.

Fast forward to the present… Somewhere between a robot stenographer and a machine-driven writing accelerator, speech-to-text has reached a new plateau, capable of recording your spoken words three or four times faster than you could type them — more or less reliably.

Word

In Word and other Microsoft 365 apps that have text fields, including Outlook, you can take advantage of built-in dictation capabilities. You’re not limited to words alone; you can use voice commands to handle punctuation (say “period” and one magically appears at the end of a sentence), editing (backspace, insert space, delete words and phrases), and navigation (select a sentence or go to end of paragraph).

Formatting, table generation, symbol insertion (such as asterisk, backslash, and em-dash), and commands for the dictation process itself are also provided.

Many writers grumblingly began using Word in its early days and became more accepting of it as companies adopted it as the standard for word processing. Now we can take a largely hands-off approach to producing copy by using our voice. And guess what? It actually works pretty well.

It takes a while to get familiar with the commands, learn what features work well and which ones still need work, and get yourself in the frame of mind where you can speak sentences that you string together in your head, rather than typing your sentences and thoughts.

It might require different parts of the brain to do this, but it can awaken parts of that inherent oral, traditional storytelling ability that we all have to some degree.

For directions when using Word for Mac, click here. For Word users in Windows 11 users, click here.

Google

Google calls its speech-to-text service Voice Typing, and it’s available for free whenever you use Google Docs within a Chrome browser. You can also use it for Google Slides speaker notes.

Access voice typing from the Tools menu. As is the typical practice in apps of this type, there are specialized commands that enrich the editing experience and simplify work, such as “Select paragraph,” “Insert table,” or “New paragraph.” A summary of available commands can be found here.

Google has a long history of working with language, especially in terms of translation from one language to another. I suspect this work has helped advance its speech-to-text capabilities because the accuracy seems quite solid and the recognition very good.

From my limited experience, Google voice typing appears particularly adept at performing voice recognition with a simple external microphone (in this case, a Blue Snowflake mic about 3-feet away). However, it has an annoying tendency to capitalize words in a sentence, misidentifying them as names of songs or stories or just randomly. You can rectify this by using the Select command to highlight the offending word and then say “lowercase” to fix the problem. But even if this just happen two or three times in a paragraph, it’s annoying and inexplicable.

Voice typing is worth a try and, of course, you can download document files that you create in Google Docs as Word format files and gain the benefit of broader accessibility for your written works or professional endeavors.

Apple

Within a number of Apple apps (such as Pages, Mail, and Keynote) you’ll find Start Dictation near the bottom of the Edit menu. This command also appears in certain text-based apps, such as Microsoft Word. As with other speech-to-text apps, Dictation within Macs has a fairly extensive list of options and commands available.

5 Steps to Self-PublishingOne word of caution: if you are running MacOS Monterey, you’ll be stopped cold trying to use commands in Dictation unless you take one step first. Go to the Apple System Preferences and enable Voice Control under the Accessibility options. Otherwise, routine commands such as New Line, New Paragraph, and New Page won’t be performed. I’m not sure if this is an issue with the Ventura operating system; Apple is working on a fix for a bug that affects those who use third-party malware scanners or other security tools.

Accuracy for Apple Dictation seems on par with Google and Microsoft speech-to-text efforts. If you spend most of your writing time on an Apple machine, this may be your best option, particularly as you become familiar with voice control features and can use them in other members of the Apple family of products.

Integrating voice recognition and Dictation through Siri is handy for iPhones, iPads, and iPods and seems quite effective for short lists, quick entries, short mail communications, and general Siri questions. The more you use these commands and they become second nature, the more efficient you can be on the full range of devices.

Other apps and services

Dragon has been a leader in the business of voice recognition since the very beginning and boasts enviable recognition accuracy (99 percent by their numbers, but this, of course, can vary for different use cases and users). They put a high value on training the software (with reading exercises) and learning as you correct certain words that are consistently misidentified. Within certain profiles, this can make a great difference in results.

The software can be pricey (around $1,500 for medical versions). Specialized versions also exist for legal, professional, and other applications. You’ll also find discontinued versions floating around the Internet

On the enterprise side, the options expand as you get into paying for voice recognition services. IBM has Watson Speech to Text and offers a free tier to let you try it out. But their real goal is hooking good-sized businesses, providing support for call center operations, and launching AI-powered operations within cloud environments. Nonetheless, it is enlightening to see how one of the largest, most accomplished companies on the planet has dealt with the challenge of voice recognition.

Getting the best results

Results will vary by individual circumstances, but optimal results for accurate recognition favor connecting a quality external microphone to your desktop or laptop machine. This is less important than it used to be; you can get away with using built-in mics and get fair accuracy, but the best results come from a quiet environment and a headphone mic, ideally with noise cancelling. Reducing the chatter of external audio disturbances is an effective way of boosting recognition accuracy.

To a large degree, advances in computer processing, storage, and networking have enabled much of the progress in speech-to-text. But it’s also the algorithmic nuances that can deal with language quirks, vocal variations, idioms, and jargon issues that have been party to giving us reasonably respectable speech-to-text applications, many of them free and embedded in easily accessible channels.

If you decide to adopt one of the free voice-to-text options built into applications, or you opt for a paid application, expect to spend some time learning the commands and getting used to the quirks. Getting to a level of useability that saves you time and minimizes the need for corrections may take some work. Best I can say is try it and see if it suits you. You might be pleasantly surprised.

Your path to self-publishing

Related Posts
Nine More Idioms Traced To Their Roots
Publishing Jargon: A Beginner’s Guide for Authors
Scrivener: My Three-year Review
How Editing Software Helps Improve Your Manuscript
My Five Favorite Books On Writing

4 COMMENTS

    • Speak for yourself. I remember WordPerfect randomly changing your document’s font (and “forgetting” there was any font other than Courier New when you tried to change it back), deleting your work, and crashing during autosave (losing your whole document in the process).

      Good thing OpenOffice came along.

  1. My favorite app is Otter AI — https://otter.ai/ — I recommend it to all the authors in my writing groups and to all new journalists I mentor. It works without needing any kind of training, the basic plan is free, and it integrates well into Zoom and Team calls.
    Plus, my favorite part, is that I can use it on my cell phone so that I can walk — or drive — while dictating. Some of my best ideas come while I’m driving somewhere, and Otter works in the background even while I have my maps app open on the screen. And it records well over background engine noise.
    It’s a cloud-based service, so all your transcripts are in one place on their website. You can record on your phone, or any other device, then copy-and-paste from your transcript into your story notes. And if part of the transcript doesn’t make sense, you click on it, and it starts to play the audio from that spot.
    And if you’re a journalist, it slices up the transcript by speaker so you can quickly tell who’s saying what.
    I first started using the app a couple of years ago because it was the only free mobile app that didn’t force me to hit the record button again and again every time I stopped to think. I pause a lot while dictating!

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.