Transcription

Transcription involves converting the spoken word into written text. Common reasons for this include the creation of sub-titles or the documentation of discussions in meetings or workshops. Transcription can be very time consuming, but the use of specialised technology and equipment provides considerable help and improves efficiency.

What output is created in the transcription process?

Here is an example of the input to (audio file) and output from (digital text file) the transcription process. The standard deliverable is a Word file containing the text; additional information such as time stamps and the identity of the people speaking can also be included. The example here involves two people (the interviewer (I) and respondent (R) talking in an interview.

We at Ulingo recorded this interview, with Ulrika Borking asking Nick Butler about life as a translator.

It is of course good if the audio recording is of as high quality as possible. The use of a dictaphone or similar is recommended. It can also be acceptable with the microphone on a standard mobile telephone, although poor sound quality can have an impact on the final product.

Depending on for what the transcribed text is to be used, there are a number of different ways of carrying out the process. Ulingo transcribes based on your specific needs, by asking the questions required in order to ensure that you will be pleased with the final product. If desired, transcriptions are provided with time stamps at intervals suitable for the intended purpose. We transcribe in Swedish and English. A selection of common transcription requirements and solutions are provided below:

Audio recordings of interviews, meetings or discussions need to be converted into text in order to be analysed and reviewed:

Audio recordings of various kinds of lectures, meetings or briefings need to be converted into written text in order to be edited or disseminated:

Video recordings containing spoken language need to be supplemented with written text of what is said in order to create subtitles or allow some other sort of processing:

In this case, it is useful to decide whether to transcribe exactly what is said or instead the essence of what is said. Transcribing the essence of what is said involves omitting excess words (such as repeated phrases or noises that do not constitute proper words) and small prompts issued by the interviewer. One also needs to decide how formal or informal the transcribed text should be, as well as if it should be 'tidied up', for example by correcting grammatical errors. If, however, exactly what is said is to be transcribed, any repeated phrases or additional verbal noises are also included, as the way in which things are said can be of importance in this case. The latter may be required if, for example, the text is to be analysed for research purposes.


This could involve a service provided to students or staff who, for various reasons, cannot take their own notes during a lecture or presentation. In such cases, the aim is to write down what was said in the clearest and most concise way possible. A similar requirement is, for example, that a speech needs to be presented or stored in written form.

A common procedure here is to transcribe both what is said and how it is said, but this must be done in an efficient manner. In the case of subtitles, the text should not be too long, as it has to fit into a limited space in order to be synchronised with the words that are spoken. It must also be easy for viewers to read the text efficiently, and in this context one should also consider the type of audience adaptation that is required and how much adaptation is needed. In some cases, it may also be appropriate to correct grammatical errors, "translate" slang or tone down offensive language – if it is not an integral part of the final product's key message of course.