Speech To Text Converted

Speech recognition technology has existed for much longer than most people realize. The earliest systems only recognized digits but they date back to the 1950s.

The technology to convert speech to text has become part of our lives over the last decade or so. Tools like Siri, Hey Google, and Alexa mean most people use speech recognition almost every day. While those tools offer convenience, they’re not very good for converting longer speech to text. There are other options for that type of conversion but how well do they work?

Let’s look at your options to convert speech to text and which alternative is your best choice.

Apps to Convert Speech To Text

Audio-to-text apps that will convert a recording to text have existed for a while and are even built into the latest operating systems. Personal computers have been powerful enough to make these types of apps viable for the last couple of decades.

These apps can work in a couple of ways. One, you can dictate into the app itself and have it transcribe the text as you speak. Alternatively, you can load a recording made elsewhere into the app and let it convert it into text. As computers have become faster and more powerful, these apps have become more accurate. They’re even available on mobile devices like iPhones and iPads, making them more versatile than ever.

There are still plenty of drawbacks to this method of turning speech into text though.

Limited Support for Specialized Content

One of the biggest problems these types of apps have is the inability to “understand” specialized language or industry-specific terms or contexts. If you work in the legal, finanicial, or medical field, for example, it’s quite likely you use a lot of terms that wouldn’t be used in day-to-day conversation.

Audio content like this isn’t well supported by conversion apps, which means you’ll end up spending quite a bit of time proofreading and correcting the output once the text is generated. It kind of defeats the purpose of automated transcription if you have to spend a lot of time making corrections once it’s finished.

Difficulty with Accents

These types of apps can have similar problems with accents. While your audio might be in a language the application supports if the speaker has a heavy accent it can cause problems for the app. Like the specialized language problem, you’ll often end up with a transcription that needs a lot of manual correction, once again defeating the purpose of an automated conversion.

Online Services

Another computer-based solution for turning audio into text is an online solution. These are similar to the apps we just discussed but instead of running them on your PC or mobile device, they’re completely web-based.

You upload your audio file to the website, they do the conversion to text on their end, and you download the text file once it’s finished. These services often use more powerful computers than you would be using to convert the audio into text so they can be faster and more accurate. Ultimately, they still have the same problems as local apps – limited flexibility for specialized content or unusual accents.

Keeping it Confidential

Online services introduce another potential problem that can be significant: confidentiality. Because you have to upload your audio file to the service, it’s saved on someone else’s server.

If that service ever gets hacked or someone working there is less-than-honest, your audio could get exposed to the world. That may not matter in some cases but in others, it could be a serious problem. And in some industries, it could lead to legal problems and significant fines. If you work in a field that has strict privacy regulations, such as healthcare, many online speech to text services aren’t an option.

Software-Based Audio to Text Isn’t All Bad

While there are disadvantages to computer-based transcription, it’s not all bad. It’s a relatively inexpensive way to convert your audio and with the latest machine learning technology, it can “learn” from its mistakes and get better over time. Software- or web-based solutions are best for simple audio content, taking notes, or anything with a low level of complexity. But even then, you’ll still find yourself having to proofread everything and make a reasonably large number of corrections.

It’s also useful for situations where accuracy isn’t as important as getting the content converted to text such as monitoring behaviors at call centers. If the text is only being reviewed internally, 100% accuracy may be less important than speed and cost.

Human Transcription

The most accurate and flexible method to convert speech to text is still human transcription. With this option, someone will transcribe the text by listening to the audio and typing it out themselves.

You could do this yourself, of course, but it can take a lot of time. It depends on how quickly you can type, for one thing. This might be an option for shorter, one-off audio clips but it’s not an effective option if you need transcription done on an ongoing basis or you have a lot of audio to convert. The better choice is to use an established transcription service to do the conversion for you. A good transcription service, will work with high-caliber transcribers who can handle specialized language, various accents, and even multiple languages, as well as indicate accuracy standards (like Way With Words offering 99%+ – see our online instant quote calculator at the bottom). 

Considerations When Choosing a Transcription Service

Not all transcription services are created equal. Some of them work with offshore transcribers who may not have a strong grasp of English or whatever language you’re working with. You could also run into confidentiality and data security problems if they don’t have a strong confidentiality policy in place – and enforce it if they do. We suggest you also read our post on how to compare transcription services.

Quality of the Transcribers

Look for a service that invests a lot in their transcribers. They’re paid well (and on time), receive good training, and are a diverse group with various subject specialties. If the service doesn’t treat its transcribers well, they won’t stick around for long. That will have a trickle-down effect on the quality of the work being done.

Security and Confidentiality

Make sure the transcription service is clear about how they keep your information private and confidential. Way With Words clearly outlines these steps in its privacy policy, for example. In addition to keeping your information confidential, it needs to be secure. In some industries, that includes how and where the information is stored. If you work in an industry that regulates where your data can be stored (usually within your own borders) you’ll need to ensure that any transcribers the service uses are located in the same country as you. This has the benefit of being more likely that you’ll be working with a native speaker as well.

Subject Specialization

If you’re in a highly technical or specialized industry, you’ll benefit from working with a service that has transcribers who are familiar with that industry. They’ll be able to recognize the industry-specific language and jargon that might be used in your audio, which will make the transcription process faster and more accurate.

Customized Transcriptions

Human transcription services can also offer customized options that computer-based solutions aren’t able to provide. This includes things like time coding, full verbatim (everything uttered, right down to the umms and ahhs), or almost any other special requests you might have.

How Much Does Human Transcription Cost?

As we already mentioned, human transcription will likely cost more than computer-based options. At least in the up-front costs. If you factor in the time and effort needed to proofread and correct many computer-based conversions, the difference won’t be as much as you might think. To get an idea of what kind of cost you’ll be looking at to convert your audio to text, you can try our online audio transcription calculator or video transcription calculator by visiting our home page.

Human transcription costs are calculated on a per-minute basis, not per-word, so it will depend on how long your audio is. See our instant cost calculator below.

It’s also based on the required turnaround. If you need the text back the next day, it’s going to cost more than if you can wait a week or two. Plug the numbers into our calculator to see how the different variables affect the cost.

How to Choose the Right Solution

As we’ve already pointed out, there’s no “best” solution to convert speech to text. It depends on your needs:

  • What kind of audio you need converted
  • How much time you want to spend making corrections
  • Your budget
  • The complexity of your audio

For simple audio, a computer-based solution might work fine. You’ll have the text quickly and if you don’t mind spending the time to proofread and correct it, you’ll end up with a usable document.

But if you’re dealing with more complex audio or want to get an accurate text conversion with minimal maintenance needed, human transcription is the way to go.

For more insights also read our article on How to compare transcription service costs.


Get Your Instant Quote Now

Way With Words’ standard transcription service is calculated on a per audio or video minute rate. Pricing depends on the turnaround time chosen and the add-on options selected. The longer the turnaround and the fewer the add-ons selected, the lower the price. Use our calculator below to get started ↴

Number of minutes to transcribe
Turnaround time
Time Coding  
Full Verbatim  
Rate Per Minute   $0.00
Deposit   $0.00
TOTAL: $0.00

Turnaround time
Time Coding  
Full Verbatim  
Rate Per Minute   $0.00
Deposit   $0.00
TOTAL: $0.00 Sales Tax N/A

Do you have something special in mind?

Send us your specific requirements by using our CONTACT FORM and we will get back to you with your custom quote.

Just want to chat? Please feel free to use our live chat app by clicking the tab at the bottom of the page.

Additional Services

Our captioning rates are also calculated per video minute, while our custom audio to text solution pricing is negotiable – depending on your required volume and conditions of service.