How to convert an audio file to text

There is increasing interest to find reliable ways to convert an audio file to text, commonly coined transcription

Users range from individuals who need personal recordings or dictations transcribed to large-scale companies and Government departments that require commercial applications.

5fe1c7c5c883c8895273099f connection

In this brief guide: “How to convert audio file to text“, we first list the key benefits of converting an audio file to text (especially for those of you wondering why) and then review some of the most popular audio to text conversion software for the do-it-yourself user and recommended transcription services for those of you looking to outsource this process.

Why Convert Audio Files To Text?

What are some common reasons why people convert audio files to text?

The process to convert an audio file to text is important in many situations. Examples of users who have their audio files converted include researchers, presenters, doctors, and lawyers. These users mostly require a transcript of their audio to assist them with keeping a record of statements made, decisions committed to or to review key findings.

As a transcription company, we hear lots of requests to transcribe audio to text. Four frequent reasons clients ask us to convert their audio file to text include a need to share information from the audio with others, a requirement to keep text records of what was said, to help participants in an audio recording to focus on the conversation taking place and reflect on the discussion later, and to have a method to quickly track or find information in an audio recording.

We briefly take a look at each of these benefits.



First of all, it’s often a great way to share information. People enjoy listening to audio. Think of podcasts, audiobooks, voice notes or even listening to videos or webinars. But accessing such files can be difficult at times as people can be hindered by not always having the right technology at hand or be in a position to listen or watch. So, converting an audio file to text would allow anyone to read what was said in the audio recording, at any time.



Suppose you’re holding a meeting or a conference, but some people aren’t able to attend. That means that they will miss out on all the content of the discussion unless they manage to get hold of the notes. Again, this is a great example where converting an audio file to the text of the discussion ensures that anyone not present will not miss out on any information.



Think about how often you spend time scribbling notes or trying to take down as much information as you listen, while you miss some key points. If instead, you record the meeting discussion and get your audio file to text transcribed, you can furnish everyone with a record of the information and decisions made. At the same time, this step ensures everyone can pay full attention to the conversation and not lose the train of thought or miss any key points.



When writing a dissertation or doing research, for example, it might get quite difficult to track the information down from an audio file, especially if you have to listen to hours and hours of recordings to find a pertinent comment. With lots of audio recordings getting your audio file transcribed will help you search the document for required keywords. You can then also copy and paste selections from the transcript right into another document.


So, what are the options available to convert an audio file to text?

There are 3 options you can consider to achieve this. We list each option with our thoughts and a quick “pro and con” to perhaps help you evaluate which method you might like to choose.



This is your do-it-yourself option.

In this case, you need your own computer/ mobile/ technical listening equipment and possibly additional equipment such as foot pedals to hear the audio and type out the transcript. The software must be capable of playing back/ repeating audio comments as you will have to listen over (and sometimes multiple times) audio speech which might not be clear (due to background noise, over speaking, faintness, etc). There are some good choices out there, depending on what you need and are willing to spend.


Our transcription software recommendations include:


    • PRO: Not expensive, minimal cost, or even free options exist (great if you have no money).
    • CON: You need lots of time!
    OPTION #2:


    Another do-it-yourself but assisted solution.

    Ahhhh, the elixir of new technology, artificial intelligence (or AI to you techies), and all things virtual. So does machine transcription really work?

    Well, yes and no.

    There are increasing numbers of machine transcription solutions that offer “audio file to text converters” in a range of languages. These include a few machine transcription technologies that now prompt dialect choices (such as US English, British English, Australian English). Subsequently, some machine transcription providers push software which will “machine transcribe” your audio, while you edit alongside – essentially letting the machine convert audio file to text, while you correct the text.

    A good example of machine audio to text transcription is google audio to text, which typically converts mp3 to text (as well as other formats). Google Speech allows you to transcribe audio to text for good-quality recording but does cost you once you require a more specific output.  Another option to transcribe audio to text is Watson or IBM speech cloud, which offers a starting”lite plan”.


    Software developers have been working on machine transcription software that will convert audio files to text for many years. Thousands of accents and dialects are spoken worldwide. There are uncountable differences in the way people speak. Just consider the speed of speech, enunciation, pronunciation, slurring of speech, and swallowing of vowels per individual! It’s not surprising therefore that the accuracy level of transcription software for spoken languages, in general, is still lower than the ability of a trained and experienced human transcriber ear for multiple scenarios (2020).

    Add to that an excellent command of language, sharp hearing, and knowledge of the various subject matter and the relevant terminologies, which makes the context of any conversation also important, machine transcription solutions are still working at levels that can be largely disappointing as a single solution for general users if an accurate transcript is required.

    The accuracy of human transcribers is estimated to be around 94% – 99% depending on large variances in the quality of recording audio (see our recording guidelines for quality sound). The human ear is designed to hear a lot of detail, especially in group conversations, noisy backgrounds, and a lot of other challenging acoustic situations. Machines, however, are still mostly trained for the single speaker (dictation). Machine solutions tested by our company each year using general audio within the language and dialect offered by available machine audio to text converters still remains significantly low when it comes to accuracy. The results showed an accuracy less than the human ear – around 70% down to nothing (depending on the quality of audio).


    • PRO: Bulk transcription quickly at relatively low cost.
    • CON: You need lots of time to go over mistakes and correct them!

    We suggest that if you are going to try machine transcription, take a look at Google audio to text speech as a good starting place.

      OPTION #3:


      A do-it-for-you option.

      Human-driven transcription services and the related audio to text ‘industry’ arose a few decades ago as many online transcription companies entered the market to offer this solution. For many, this option still remains a viable choice, especially where accuracy and speed are important considerations for the customer. The key value of this service option is to get a highly accurate text version of what was said in the time you require.


      Users of human transcription services come from many sectors. These include the academic sector, medical sector, legal sectorfinancial sector, research, and even media. There are many more subject areas but essentially, these services must be capable to offer subject expertise. Missing some key terms can sometimes have devastating consequences!


      Ironically, choosing the best service remains difficult. If you google (or bing!) the internet these days you will see many transcription companies with promises of 99% accuracy in the transcript! There are even one or two services that promise 200%(!) accuracy (how is that even mathematically possible?). Anyhow, the bottom line is that selecting the best transcription service is still often a bit of a gamble at times.


      Entrusting the right service with your important audio recording is vital. Cheap pricing and over-promising expectation frequently end in disappointment – yours. Usually, this leads to even more costs or eventually having no choice but having to convert an audio file to text yourself. We know as we have seen this with clients coming to us from other transcription services for this very reason. 


      • PRO: You save time
      • CON: You spend money

      With the above options in mind, we hope you make a more informed choice.