Princess: A lot of people .hhh ((clears throat)) saw the distress that my life was in. (.) And they felt (.) felt it was a supportive thing to help (0.2) in the way that they did.
(from Silverman, 2004, p. 208)
In some cases, for example if you are undertaking a discourse analysis or a conversation analysis, a very detailed transcription is necessary. Not only is natural speech often non-grammatical (at least by written conventions) but it is also full of other phenomena. People hesitate, they stress words and syllables, they overlap their speech with others and they raise and lower both volume and pitch in order to add meaning to what they are saying. If you need to record these features then there are various transcription conventions you can follow. One of the most widely used is the Jefferson system (see Atkinson and Heritage, 1984) and a similar system can be found in Silverman (2004, p. 367; see also Rapley, 2018, and Brinkmann and Kvale, 2018).
Doing the Transcription
The researcher
The choice of who should do the transcription usually comes down to either you, the researcher, or someone else who is paid to do it. Despite the nature of the activity, which can be tedious, especially if you are not a good touch typist, there are advantages to doing your own transcription. Most importantly, it gives you a chance to start the data analysis. Careful listening to recordings and reading and checking of the transcript you have produced means that you become very familiar with the content. Inevitably you start to generate new ideas about the data. Nevertheless, researchers usually do their own transcription because they have no choice. They have no funds to employ an audio typist or the content of the recording means that no one else can do it. For instance, the interviews may be about a highly technical subject or, as is often the case with anthropological work, in a language very few others can understand.
Nowadays, most people making recordings of interviews, focus groups, etc., will use digital recorders, either digital audio recorders or digital video cameras. Good-quality digital recorders are not cheap, but you might find that your department has machines that you can borrow. The same is true of video cameras. Most audio recorders and video cameras record onto removable memory cards. These can be played back on the recorder itself, or, for a much better reproduction can be transferred (by card reader or USB cable) to a computer. However, these days most people have their own digital audio recorder and camera in the form of a smartphone (cell phone). Although the microphones on these are not of the best quality, they will produce very usable quality recordings for simple interviews carried out in quiet surroundings. The one thing to watch out for with smartphones is that some have limited internal memory and no ability to use removable memory cards, so you need to make sure you have enough spare memory to make the recordings you need. In most cases all these devices will record the digital files in MP3 format. This is a compressed format and produces small file sizes, but is still very good quality (the same as used typically in podcasts). Audio recorders and smartphones using downloadable apps can also record to uncompressed WAV files. These are much better quality (similar to CD quality), but tend to be ten times the size of MP3 files. However, because the quality is much better, if you are recording in noisy environments or recording focus group discussions then it might be a good idea to use this format (of course, making sure you have enough memory left to make the recording). Actually, in these kinds of testing situations it might be best to use an audio recorder, as these have very good quality built-in microphones, or think about using an external microphone on your smartphone.
Don’t try to transcribe directly from the recorder. Transfer the files to a computer first (this gives you a second copy and is good for security reasons too). Then use software to play the files. There is some good, cheap or free software designed for transcription work that enables you to control the playback as you type. For example, one program allows you to type into a text box as you hear the recording and then pause and restart the speech using a function key. The advantage of digital recordings is that the pause is instantaneous and no words are lost when you restart the playback and there is little need for rewinding. However, some programs do allow you to set an automatic rewind. Half a second or a second is usually enough to make sure you don’t miss anything. Another program allows you to split the speech into short phrases that are easier to control while transcribing. Some software will work with a foot pedal (connected via a USB cable) to stop and start the playback. If you are a good audio typist, this is a good gadget as you can keep your fingers on the keyboard while stopping and starting. Another useful facility in the software for those who are less rapid typists is the ability to replay at a slow speed, and pitch adjust at the same time. So the recording sounds normal but is just played much more slowly. This can be very useful for making accurate transcriptions. Transcription software is available for both Windows and Mac and includes: Express Scribe (free) and F4Transkript and F5Transkript. Several of the CAQDAS programs also offer these transcription functions.
Audio typist
Employing someone else to do the transcription, if you can afford it, is a good option, especially if the recordings are easily understandable or the notes and documents that need transcribing are easy to read. It is best if the typist you are employing knows something about the subject matter and the context of the interviews. Also make sure they know what kind of level of transcription you require. Check their work early on to make sure it is in the format you want. The last thing you want is to pay a lot of money for very detailed transcriptions that you do not need. No matter who you use, you will still need to check through the document produced against the recording or original text to eliminate mistakes. However, this is not all lost time as, again, reading the transcript (and listening to the recording) will be an opportunity to begin your analysis.
Don’t forget that the typist will be listening to or reading all your data. As Gregory et al. (1997) remind us, they are ‘vulnerable’ persons. If the content of your data is emotionally loaded and sensitive, you might want to consider including your transcribers in the scope of your ethical considerations and you may wish to offer some debriefing to support them.
OCR and speech recognition software
In recent years two new technologies have become available that can help the transcription process. If you have some typed or printed documents that you need to get an electronic copy of, then optical character recognition (OCR) software used with a scanner will help. Provided the original paper copy is good quality and that standard fonts like Courier are used for the typescript, then the software will work well in producing word processor files from the paper copies. Save your text as plain text because the layout, fonts, etc., that formatted text gives you are seldom of much relevance to your analysis.
A more recent technology that is sometimes used by qualitative researchers is speech recognition software. This takes speech spoken into a special, high-quality microphone and converts it into a word-processor file. The software can be used with natural speech and it can also cope with versions of English, such as UK English, Southeast Asian and Indian English, as well as some non-English languages, such as Spanish. However, it always needs to be trained to recognize the speech of one particular user and needs very good quality sound or sound recordings (ideally .WAV files). For these reasons it cannot be used directly with poor quality recordings of respondents and especially recordings of focus groups. However, what some enterprising researchers have done is to set up an audio player or computer with a pair of headphones with which they can listen to the recording. Then as the recording plays they pause after each phrase and dictate into the speech recognition software, rather in the manner of a parallel translator. The accuracy can be variable, but it is generally sufficient for a first draft transcription that can then be checked against the recording properly. Speech recognition is a computationally intensive task and all programs need fairly powerful computers. Check before you buy.
Accuracy
No matter how the transcription is produced, OCR, speech recognition or human typist, it will need checking against the original. Errors arise for a variety of reasons. First, there are simple typing errors, misspellings and so on. Most of these can be picked up using the spelling checker and grammar