48 Shohamy, E. (1994). The validity of direct versus semi‐direct oral tests. Language Testing, 11(2), 99–123.
49 Spolsky, B. (1995). Measured words: The development of objective language testing. Oxford, England: Oxford University Press.
50 Stansfield, C. (1991). A comparative analysis of simulated and direct oral proficiency interviews. In S. Anivan (Ed.), Current developments in language testing (pp. 199–209). Singapore: RELC.
51 Stansfield, C., & Kenyon, D. (1992). Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview. System 20(3), 329–45.
52 Tannenbaum, R., & Wylie, E. C. (2008). Linking English‐language test scores onto the common European framework of reference: An application of standard‐setting methodology. Princeton, NJ: ETS.
53 Van Moere, A. (2006). Validity evidence in a university group oral test. Language Testing, 23(4), 411–40.
54 Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing, 29(3), 325–44.
55 Wind, S. A., & Peterson, M. E. (2018). A systematic review of methods for evaluating rating quality in language assessment. Language Testing, 35(2), 161–92.
56 Winke, P., Gass, S. M., & Myford, C. (2011). The relationship between raters' prior language study and the evaluation of foreign language speech samples. Princeton, NJ: ETS.
57 Xi, X., Higgins, D., Zechner, K., & Williamson, D. (2012). A comparison of two scoring methods for an automated speech scoring system. Language Testing, 29(3), 371–94.
58 Xi, X., & Mollaun, P. (2006). Investigating the utility of analytic scoring for the TOEFL academic speaking test (TAST). Princeton, NJ: ETS.
59 Yan, X. (2014). An examination of rater performance on a local oral English proficiency test: A mixed‐methods approach. Language Testing, 31(3), 501–27.
60 Yan, X., Cheng, L., & Ginther, A. (2019). Factor analysis for fairness: Examining the impact of task type and examinee L1 background on scores of an ITA speaking test. Language Testing, 36(2), 207–34. doi: 10.1177/0265532218775764
61 Yan, X., & Ginther, A. (2017). Listeners and raters: Similarities and differences in the evaluation of accented speech. In O. Kang & A. Ginther (Eds.), Assessment in second language pronunciation (pp. 67–88). Oxford, England: Routledge.
62 Yan, X., Maeda, Y., Lv, J., & Ginther, A. (2016). Elicited imitation as a measure of second language proficiency: A narrative review and meta‐analysis. Language Testing, 33(4), 497–528.
63 Yan, X., Thirakunkovit, S., Kauper, N., & Ginther, A. (2016). What do test takers say: Test‐taker feedback as input for quality control. In J. Read (Ed.), Post‐admission language assessments of university students (pp. 113–36). Melbourne, Australia: Springer.
Suggested Readings
1 Fulcher, G. (2003). Testing second language speaking. Harlow, England: Pearson Longman.
2 Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance‐based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15–21.
3 McNamara, T. (1996). Measuring second language performance. London, England: Longman.
Assessment of Vocabulary
JOHN READ
Introduction
Vocabulary is such an integral component of language use that to some degree any form of language assessment is a measure of vocabulary ability, among other things. However, assessing vocabulary is normally understood to be a process of testing learners' knowledge of a sample of content words, not simply to find out whether they know those particular words but to investigate their ability to understand and use vocabulary in a more general sense.
It is useful to explore a little further what is meant by vocabulary. Most commonly, for assessment purposes, vocabulary is defined in terms of individual words, but the concept of a “word” is itself a slippery one, as is illustrated by the following sets of word forms:
(1) | shout | SHOUT | shout | shout | |
(2) | save | saves | saving | saved | |
(3) | product | products | production | productive | productivity |
(4) | sea | season | seal | search | seamless |
In set 1 we have different typographic forms of the same word, with the capital letters and the alternative fonts perhaps indicating added emphasis rather than any difference in meaning. Although the word forms in set 2 are a little more distinct, we can see them as representing different grammatical functions of the same word. In the third set, we can recognize that the words have a common stem form (product) and a shared underlying meaning, but the individual items in the list are more like related but separate words (i.e., members of the same family), as compared to the word forms in the first two sets. On the other hand, the items in set 4 are superficially similar in form, in that they all begin with sea–, but they are obviously different words.
From an assessment perspective, there are two points to make here. First, if a vocabulary test item targets a particular word form and test takers answer correctly, can we also credit them with knowledge of related words? For instance, if learners show that they understand product, is it reasonable to assume that they also know production and productive? The other point is that, even at the level of the individual word, we are not dealing purely with vocabulary; sets 2 and 3 above show how the grammatical functioning of words is an essential part of vocabulary knowledge.
Moving beyond individual word forms, we increasingly recognize that vocabulary also consists of fixed combinations of words, variously known as compound nouns (ironing board) phrasal verbs (put up with), collocations (heavy rain), idioms (shoot the breeze), formulaic sequences (I declare the meeting closed), lexical phrases (as a matter of fact), and so on. To the extent that these multiword units are included in vocabulary tests, they are often treated as if they were single words, but such items have received little attention in their own right in language assessment.
Selection of Target Items
Once the lexical units that are the focus of the assessment have been defined, the next step is to select a sample of those units, and the basis for that selection depends on the context. For example, in the classroom situation, the vocabulary that the learners need to know is commonly specified in the coursebook, the syllabus, or other components of the language curriculum. For more general purposes, word‐frequency lists are used as a key reference, on the principle that higher‐frequency words are more useful to, and more likely to be known by, learners than words which do not commonly occur in the language. The classic source of word‐frequency information for ESL vocabulary tests has been West's (1953) General Service List.
With advances