What’s the Best Way to Convert Speech to Text?
Speech recognition technology has existed for much longer than most people realize. The earliest systems only recognized digits but they date back to the 1950s.
The technology to convert speech to text has become part of our lives over the last decade or so. Tools like Siri, Hey Google, and Alexa mean most people use speech recognition almost every day.
While those tools offer convenience, they’re not very good for converting longer speech to text. There are other options for that type of conversion but how well do they work?
Let’s look at your options to convert speech to text and which alternative is your best choice.
Apps to Convert Speech To Text
Audio-to-text apps that will convert a recording to text have existed for a while and are even built into the latest operating systems. Personal computers have been powerful enough to make these types of apps viable for the last couple of decades.
These apps can work in a couple of ways. One, you can dictate into the app itself and have it transcribe the text as you speak. Alternatively, you can load a recording made elsewhere into the app and let it convert it into text.
As computers have become faster and more powerful, these apps have become more accurate. They’re even available on mobile devices like iPhones and iPads, making them more versatile than ever.
There are still plenty of drawbacks to this method of turning speech into text though.
Limited Support for Specialized Content
One of the biggest problems these types of apps have is the inability to “understand” specialized language or industry-specific terms. If you work in the legal or medical field, for example, it’s quite likely you use a lot of terms that wouldn’t be used in day-to-day conversation.
Audio content like this isn’t well supported by conversion apps, which means you’ll end up spending quite a bit of time proofreading and correcting the output once the text is generated. It kind of defeats the purpose of automated transcription if you have to spend a lot of time making corrections once it’s finished.
Difficulty with Accents
These types of apps can have similar problems with accents. While your audio might be in a language the application supports, if the speaker has a heavy accent it can cause problems for the app.
Like the specialized language problem, you’ll often end up with a transcription that needs a lot of manual correction, once again defeating the purpose of an automated conversion.
Another computer-based solution for turning audio into text is an online solution. These are similar to the apps we just discussed but instead of running them on your PC or mobile device, they’re completely web-based.
You upload your audio file to the website, they do the conversion to text on their end, and you download the text file once it’s finished.
These services often use more powerful computers than you would be using to convert the audio into text so they can be faster and more accurate. Ultimately, they still have the same problems as local apps – limited flexibility for specialized content or unusual accents.
Keeping it Confidential
Online services introduce another potential problem that can be significant: confidentiality. Because you have to upload your audio file to the service, it’s saved on someone else’s server.
If that service ever gets hacked or someone working there is less-than-honest, your audio could get exposed to the world. That may not matter in some cases but in others, it could be a serious problem.
And in some industries, it could lead to legal problems and significant fines. If you work in a field that has strict privacy regulations, such as healthcare, many online speech to text services aren’t an option.
Software-Based Audio to Text Isn’t All Bad
While there are disadvantages to computer-based transcription, it’s not all bad. It’s a relatively inexpensive way to convert your audio and with the latest machine learning technology, it can “learn” from its mistakes and get better over time.
Software- or web-based solutions are best for simple audio content, taking notes, or anything with a low level of complexity. But even then, you’ll still find yourself having to proofread everything and make a reasonably large number of corrections.
It’s also useful for situations where accuracy isn’t as important as getting the content converted to text such as monitoring behaviors at call centers. If the text is only being reviewed internally, 100% accuracy may be less important than speed and cost.
The most accurate and flexible method to convert speech to text is still human transcription. With this option, someone will transcribe the text by listening to the audio and typing it out themselves.
You could do this yourself, of course, but it can take a lot of time. It depends on how quickly you can type, for one thing. This might be an option for shorter, one-off audio clips but it’s not an effective option if you need transcription done on an ongoing basis or you have a lot of audio to convert.
The better choice is to use a transcription service to do the conversion for you. A good transcription service, like Way With Words, will work with high-caliber transcribers who can handle specialized language, various accents, and even multiple languages.
Considerations When Choosing a Transcription Service
Not all transcription services are created equal. Some of them work with offshore transcribers who may not have a strong grasp of English or whatever language you’re working with.
You could also run into confidentiality and data security problems if they don’t have a strong confidentiality policy in place – and enforce it if they do.
Quality of the Transcribers
Look for a service that invests a lot in their transcribers. They’re paid well (and on time), receive good training, and are a diverse group with various subject specialties.
If the service doesn’t treat its transcribers well, they won’t stick around for long. That will have a trickle-down effect on the quality of the work being done.
Security and Confidentiality
Make sure the transcription service is clear about how they keep your information private and confidential. Way With Words clearly outlines the steps we take in our confidentiality policy, for example.
In addition to keeping your information confidential, it needs to be secure. In some industries, that includes how and where the information is stored. If you work in an industry that regulates where your data can be stored (usually within your own borders) you’ll need to ensure that any transcribers the service uses are located in the same country as you.
This has the benefit of being more likely that you’ll be working with a native speaker as well.
If you’re in a highly technical or specialized industry, you’ll benefit from working with a service that has transcribers who are familiar with that industry. They’ll be able to recognize the industry-specific language and jargon that might be used in your audio, which will make the transcription process faster and more accurate.
Human transcription services can also offer customized options that computer-based solutions aren’t able to provide. This includes things like time coding, full verbatim (everything uttered, right down to the umms and ahhs), or almost any other special requests you might have.
How Much Does Human Transcription Cost?
As we already mentioned, human transcription will likely cost more than computer-based options. At least in the up-front costs.
If you factor in the time and effort needed to proofread and correct many computer-based conversions, the difference won’t be as much as you might think.
To get an idea of what kind of cost you’ll be looking at to convert your audio to text, you can use our online cost calculator. Human transcription costs are calculated on a per-minute basis, not per-word, so it will depend on how long your audio is.
It’s also based on the required turnaround. If you need the text back the next day, it’s going to cost more than if you can wait a week or two. Plug the numbers into our calculator to see how the different variables affect the cost.
How to Choose the Right Solution
As we’ve already pointed out, there’s no “best” solution to convert speech to text. It depends on your needs:
- What kind of audio you need converted
- How much time you want to spend making corrections
- Your budget
- The complexity of your audio
For simple audio, a computer-based solution might work fine. You’ll have the text quickly and if you don’t mind spending the time to proofread and correct it, you’ll end up with a usable document.
But if you’re dealing with more complex audio or want to get an accurate text conversion with minimal maintenance needed, a human transcription service is the way to go.