Audio Transcription: What It Is, What It Is Not, And Why It Is in High Demand

If you have the right skills, audio transcription can help you diversify your portfolio. However, there a few things to keep in mind before you consider accepting an assignment.

The demand for transcription services remains high due to the increasing amount of audiovisual content being developed in a wide range of industries since the professional world became more remote. If you have the right skills, audio transcription can help you diversify your services as a translator. However, there are a few things to keep in mind before you consider advertising your availability.


As the name suggests, audio transcription involves transcribing the content of a recorded audiovisual file. I think it’s important to stress “audiovisual” instead of simply “audio” here, because you could be provided with either an audio or a video file and be asked to transcribe the contents of that recording.

It might seem that one definition could be used to cover all types of transcription, but that’s not always the case. For example, some clients may request transcription services when they don’t have an actual recording. If the scope of their project involves a live transcription, that’s something else entirely. So, let me start off by explaining what audio transcription is not before establishing what it actually involves.

It’s essential to be aware of some of the common misconceptions clients have about transcription services to be able to position yourself as an expert and steer them in the right direction. They will appreciate this, and it might just help you down the road when they’re looking for services you actually provide.

Mistaking Transcription for Court Reporting: When a transcription takes place in court during a live session, a certified court reporter (often referred to as a stenographer) is the one in charge of this task. A court reporter is trained to create a written verbatim record of the proceedings using a shorthand writing style called “stenograpy,” in which they use a stenotype machine to type out syllables rather than each letter of a word. This cuts down the time required to type and helps them enter what’s being said into the record in real time. But this is not transcription.

Mistaking Transcription for Live Captioning: The same is true when clients organize remote conferences and online events and need the audio “transcribed” in real time, or when major in-person events with live interviews and roundtable panel discussions have a “transcription” projected onto a large screen above interviewees and panelists. In both cases, clients would need a service called “live captioning.” And, if live captions are to be shown in a language other than the one being spoken live, then it’s called “Communication Access Real-Time Translation,” or CART for short.

Mistaking Transcription for Subtitling: Another misunderstanding happens when a potential client needs audiovisual services and requests a “transcription” when they’re actually expecting subtitles to be added to a recorded video file. Subtitling is not transcription. Subtitling requires special file formatting, and there are several technical aspects involved to ensure that character limitations are respected so the message fits on the screen. The target audience’s reading speed must also be considered so that they can have a pleasant viewing experience.

Mistaking Transcription for Audio-to-Text Translation: Finally, there are clients who expect us to listen to audiovisual material in the source language and type the equivalent message in a target language. That’s not transcription but an audio-to-text translation, because you’ll effectively be translating it “in your head” instead of providing a written record of what’s being said in the same language in which it was originally said.

So, What Is Audio Transcription?

Now that we’ve cleared up some misconceptions and learned what does not classify as an audio transcription, let’s address what the process actually entails. Knowing what each type of transcription involves is important because you must ask the right questions before accepting an assignment and understand exactly what your clients expect from the document containing the transcribed audio.

Verbatim Transcription: If the audio needs to be transcribed verbatim, this means that both verbal and nonverbal components of what you hear in the original must be reproduced in the document. Aside from the essence of the message, every factor recorded in the audio or video—from shifts in breathing, emotion and tone, to the interruptions in speech and background noise—is included in the final written document. This will involve the use of what we call “tags,” which are markers that contextualize the audio. For example, you may have to indicate when someone coughs, sneezes, or speaks loudly or softly. You must also indicate whether there are any noises, such as a phone ringing, a knock on the door, or something falling on the ground. These markers are indicated in the transcription through the use of angled, square, or curly brackets.

Punctuation is also used to help contextualize the contents of an audio file, including inserting ellipses in the transcription to represent pauses or hesitations and two short dashes for interruptions.

Example: I– I told him… It wasn’t necessary. Really. Don’t go calling me. I asked him to send me a message. {phone rings}

Overall, it’s important to use tags, markers, and punctuation consistently. It’s also essential to review your client’s style guide, if there is one, or use an industry standard, such as the Vienna-Oxford International Corpus of English (VOICE), to make sure your transcriptions are done correctly. VOICE is a structured collection of language data compiled by the University of Vienna with contributions from Oxford University Press. This project yielded the first computer-readable corpus of spoken English as a lingua franca in different kinds of interactions.

Intelligent Transcription: This kind of transcription is just as accurate as a verbatim transcription, but it’s lightly edited and may have the tags and markers removed because there’s no need to contextualize the audio setting. Since it’s still pretty literal, this type of transcription is intended to streamline the content better and make it easier for readers of the transcribed document.

Example: I– I told him… It wasn’t necessary. Really. Don’t go calling me. I asked him to send me a message.

Edited Transcription: This type of transcription often uses a more streamlined writing style and may include some changes to the transcribed material, such as removing hesitations that don’t really add anything to the understanding of the content or correcting grammar errors. The main purpose of edited transcriptions is to present the transcribed content for general consumption as written text, such publishing it as an article or website post so people who didn’t attend a live event can have access to the material in a written format.

Example: I told him it wasn’t necessary, really. Don’t call me. I asked him to send me a message.

Summarized Transcription: You can also have summarized transcriptions, which are not literal but intended to give readers the gist of the content of the audio recording.

Example: I told him he didn’t need to call me and asked him to send me a message.

Paraphrased Transcription: This is similar to summarized transcription but it is written in the third person.

Example: She told him not to call and send her a message instead.

Why Is Audio Transcription in High Demand

Transcriptions continue to serve an important purpose in many industries. They may be legal in nature: recorded depositions, hearings, interrogation sessions, investigations, wire or phone tapping, and audio obtained from mobile applications. They may be business-oriented: recorded corporate meetings or information dictated by executives and managers. They may be related to the health field: doctors and health professionals share notes in audio format about medical cases and patients. They may also be used in the academic world: dictated information about current research or studies that collect statements from interviewees.

As people continue to favor a fully virtual or hybrid model when it comes to holding conferences, workshops, webinars, group discussions, and networking events, the demand for transcription will only increase. That’s the good news—the bad news is that the field currently lacks professionals who have the skills and the appropriate training to tackle this surge in demand. In other words, more people need to study and train in the field in order to provide transcription services at the professional level and help dissipate the misconceptions we see nowadays. And there is yet another discouraging trend: many clients are settling for artificial intelligence (AI), such as automatic dictation software, when they need a transcription. Even though AI helps cut down the time, since computers can transcribe audio much faster than a transcriptionist, the human component is still needed to ensure that the final product fulfills its purpose: ensuring communication access.

Of course, it’s imperative that target markets such as the deaf and hard of hearing have access to the wealth of information distributed through recorded audiovisual materials, but the use of computer-generated transcriptions require the same level of caution we apply to machine translation. AI and machine translation can be useful tools, but professional transcribers and translators will always be crucial in ensuring proper language access to a variety of target audiences.

Requirements and Challenges

If you’re looking to diversify your language-related activities and decide to add audio transcription services to your portfolio, it’s important that you study the market and acquire the required skills to provide these services at a professional level. You must be able to type fast and accurately to minimize the need to edit what you type. You must also use specialized software to streamline your work—from adding timestamps and tags to controlling the audio file itself as you play, pause, rewind, or fast forward the track while transcribing its contents into a document.

Besides the steep learning curve, one of the main challenges will be to educate potential clients about what transcription actually is (see the misconceptions I discussed earlier) and how you can provide this service to them. As a language professional, you’re in the best position to meet the demands of this highly specialized market, and you probably already have many of the necessary skills to reach out to clients. The fact that you have at least two languages you can work with will definitely help you stand out. For example, you could possibly offer a transcription plus translation combo to clients who want to ensure access to the deaf and hard of hearing community with transcriptions of the source audio, as well as readers of other languages with the respective translation of its content.

You can learn more about this segment through organizations and associations dedicated to bringing professional transcribers together, such as the Transcription Society, the American Association of Electronic Reporters and Transcribers, the International Alliance of Professional Reporters and Transcribers, and the Association of Transcribers and Speech-to-text Providers. For further information, check out the links in the sidebar.

Transcription Resources and Tools

American Association of Electronic Reporters and Transcribers

Association of Transcribers and Speech-to-Text Providers

Express Scribe Transcription Software

International Alliance of Professional Reporters and Transcribers

KeyBlaze Typing Tutor Software

Review of Transcription Services (PC Magazine)

Three Types of Transcription: Edited, Verbatim, and Intelligent

Timestamps in Transcription

Transcription Services for T&I Professionals (Course)

Transcription Society

Translation Confessional Podcast: Transcription Services

Vienna-Oxford International Corpus of English Mark-Up Conventions

Rafa Lombardino, CT has a BA in social communications and majored in journalism. She started working as a translator in 1997 and is ATA-certified (English<>Portuguese). She has a professional certificate in Spanish>English translation from the University of California, San Diego Extension, where she teaches classes on the role of technology in translation. Currently the president and chief executive officer of Word Awareness, a small network of professional translators established in 2004 and incorporated in 2009, she specializes in technology, communications, subtitling, and book translations.

1 Responses to "Audio Transcription: What It Is, What It Is Not, And Why It Is in High Demand"

  1. Gio Lester says:

    What a great article, Rafa! Thank you. I am sharing it with some of my clients.

Comments are closed.

The ATA Chronicle © 2023 All rights reserved.