How to convert an audio file to text
There is increasing interest to find reliable ways to convert an audio file to text, commonly coined transcription.
Users range from individuals who need personal recordings or dictations transcribed to large-scale companies and Government departments that require commercial applications.
In this brief guide: “How to convert audio file to text“, we first list the key benefits of converting an audio file to text (especially for those of you wondering why) and then review some of the most popular audio to text conversion software for the do-it-yourself user and recommended transcription services for those of you looking to outsource this process.
Why Convert Audio Files To Text?
What are some common reasons why people convert audio files to text?
The process to convert an audio file to text is important in many situations. Examples of users who have their audio files converted include researchers, presenters, doctors, and lawyers. These users mostly require a transcript of their audio to assist them with keeping a record of statements made, decisions committed to or to review key findings.
As a transcription company, we hear lots of requests to transcribe audio to text. Four frequent reasons clients ask us to convert their audio file to text include a need to share information from the audio with others, a requirement to keep text records of what was said, to help participants in an audio recording to focus on the conversation taking place and reflect on the discussion later, and to have a method to quickly track or find information in an audio recording.
We briefly take a look at each of these benefits.
BENEFIT #1:
SHARE INFORMATION
First of all, it’s often a great way to share information. People enjoy listening to audio. Think of podcasts, audiobooks, voice notes or even listening to videos or webinars. But accessing such files can be difficult at times as people can be hindered by not always having the right technology at hand or be in a position to listen or watch. So, converting an audio file to text would allow anyone to read what was said in the audio recording, at any time.
BENEFIT #2:
KEEP A RECORD
Suppose you’re holding a meeting or a conference, but some people aren’t able to attend. That means that they will miss out on all the content of the discussion unless they manage to get hold of the notes. Again, this is a great example where converting an audio file to the text of the discussion ensures that anyone not present will not miss out on any information.
CONCENTRATE ON THE CONVERSATION
Think about how often you spend time scribbling notes or trying to take down as much information as you listen, while you miss some key points. If instead, you record the meeting discussion and get your audio file to text transcribed, you can furnish everyone with a record of the information and decisions made. At the same time, this step ensures everyone can pay full attention to the conversation and not lose the train of thought or miss any key points.
FIND RELEVANT INFORMATION
When writing a dissertation or doing research, for example, it might get quite difficult to track the information down from an audio file, especially if you have to listen to hours and hours of recordings to find a pertinent comment. With lots of audio recordings getting your audio file transcribed will help you search the document for required keywords. You can then also copy and paste selections from the transcript right into another document.
HOW TO CONVERT AUDIO FILE TO TEXT?
So, what are the options available to convert an audio file to text?
There are 3 options you can consider to achieve this. We list each option with our thoughts and a quick “pro and con” to perhaps help you evaluate which method you might like to choose.
TRANSCRIPTION SOFTWARE
This is your do-it-yourself option.
In this case, you need your own computer/ mobile/ technical listening equipment and possibly additional equipment such as foot pedals to hear the audio and type out the transcript. The software must be capable of playing back/ repeating audio comments as you will have to listen over (and sometimes multiple times) audio speech which might not be clear (due to background noise, over speaking, faintness, etc). There are some good choices out there, depending on what you need and are willing to spend.
RECOMMENDED TRANSCRIBING SOFTWARE
Our transcription software recommendations include:
- Express Scribe (all-around performer and very popular),
- Start-Stop Universal Transcription System (fairly good, some mixed reviews),
- Inqscribe (another well-known brand), and
- Dragon Naturally Speaking (the most popular solution, especially for dictation/ single speakers).
A QUICK PRO AND CON
- PRO: Not expensive, minimal cost, or even free options exist (great if you have no money).
- CON: You need lots of time!
MACHINE TRANSCRIPTION
Ahhhh, the elixir of new technology, artificial intelligence (or AI to you techies), and all things virtual. So does machine transcription really work?
Well, yes and no.
There are increasing numbers of machine transcription solutions that offer “audio file to text converters” in a range of languages. These include a few machine transcription technologies that now prompt dialect choices (such as US English, British English, Australian English). Subsequently, some machine transcription providers push software which will “machine transcribe” your audio, while you edit alongside – essentially letting the machine convert audio file to text, while you correct the text.
A good example of machine audio to text transcription is google audio to text, which typically converts mp3 to text (as well as other formats). Google Speech allows you to transcribe audio to text for good-quality recording but does cost you once you require a more specific output. Another option to transcribe audio to text is Watson or IBM speech cloud, which offers a starting”lite plan”.
THE CHALLENGE OF VOICE
Software developers have been working on machine transcription software that will convert audio files to text for many years. Thousands of accents and dialects are spoken worldwide. There are uncountable differences in the way people speak. Just consider the speed of speech, enunciation, pronunciation, slurring of speech, and swallowing of vowels per individual! It’s not surprising therefore that the accuracy level of transcription software for spoken languages, in general, is still lower than the ability of a trained and experienced human transcriber ear for multiple scenarios (2020).
Add to that an excellent command of language, sharp hearing, and knowledge of the various subject matter and the relevant terminologies, which makes the context of any conversation also important, machine transcription solutions are still working at levels that can be largely disappointing as a single solution for general users if an accurate transcript is required.
The accuracy of human transcribers is estimated to be around 94% – 99% depending on large variances in the quality of recording audio (see our recording guidelines for quality sound). The human ear is designed to hear a lot of detail, especially in group conversations, noisy backgrounds, and a lot of other challenging acoustic situations. Machines, however, are still mostly trained for the single speaker (dictation). Machine solutions tested by our company each year using general audio within the language and dialect offered by available machine audio to text converters still remains significantly low when it comes to accuracy. The results showed an accuracy less than the human ear – around 70% down to nothing (depending on the quality of audio).
A QUICK PRO AND CON
- PRO: Bulk transcription quickly at relatively low cost.
- CON: You need lots of time to go over mistakes and correct them!
We suggest that if you are going to try machine transcription, take a look at Google audio to text speech as a good starting place.
OPTION #3:
HUMAN TRANSCRIPTION SERVICES
A do-it-for-you option.
Human-driven transcription services and the related audio to text ‘industry’ arose a few decades ago as many online transcription companies entered the market to offer this solution. For many, this option still remains a viable choice, especially where accuracy and speed are important considerations for the customer. The key value of this service option is to get a highly accurate text version of what was said in the time you require.
SUBJECT MATTER SHOULD BE SHOWN
Users of human transcription services come from many sectors. These include the academic sector, medical sector, legal sector, financial sector, research, and even media. There are many more subject areas but essentially, these services must be capable to offer subject expertise. Missing some key terms can sometimes have devastating consequences!
ACCURACY IS CRITICAL
Ironically, choosing the best service remains difficult. If you google (or bing!) the internet these days you will see many transcription companies with promises of 99% accuracy in the transcript! There are even one or two services that promise 200%(!) accuracy (how is that even mathematically possible?). Anyhow, the bottom line is that selecting the best transcription service is still often a bit of a gamble at times.
HONEST PRICING MUST BE UPFRONT
Entrusting the right service with your important audio recording is vital. Cheap pricing and over-promising expectation frequently end in disappointment – yours. Usually, this leads to even more costs or eventually having no choice but having to convert an audio file to text yourself. We know as we have seen this with clients coming to us from other transcription services for this very reason.
A QUICK PRO AND CON
- PRO: You save time
- CON: You spend money
With the above options in mind, we hope you make a more informed choice.