Designing a placement test

Item Analysis for selective deletion (Cloze) Placement Test set on an incline of difficulty.

ITEM ANALYSIS % difficulty & % discrimination (spread of scores)

It is useful to set the tests, where possible, on an incline of difficulty since the intention is to separate the students out into class groups. The range of the students' experience is one of level as well as content.

Minimum information

A. How it has spread the students out along the range of scores.

B. How difficult or easy it has been for the students who took it.

C. How consistent it has been in its measurements (how reliable): reliability.

D. How accurately it relates to what it is intended to measure: i.e. validity

1. For placement tests and achievement tests, A WIDE SPREAD OF SCORES is helpful in applying the results (though perhaps not for proficiency tests and certainly not for diagnostic tests).

2. The difficulty of a test is indicated by the average score: THE MEAN. Add all the scores & divide by the total number of students. N.B. Since the test is set on an incline of difficulty, Nearness to MEAN is probably unimportant except for concurrent validity: i.e. test gives similar results to existing tests which have already been validated. If you gave the test to a group consisting entirely of zero beginners or one containing only Cambridge First Certificate level students, it would not discriminate. Students would either get 0 marks or full marks, respectively. This placement test is intended for a normal distribution of levels. It would tell you whether to create a number of low, intermediate or high level classes rather than an even spread of levels if your intake of students had successfully completed few, some or many years of English study.

3. Reliability: How exactly the same results can be arrived at on different occasions with similar groups of students or from assessment of the same answers by different markers. Consider factors which may make a test less reliable: e.g. subjective marking: Do different markers accept "prices" as an alternative to "fares"? Do they accept a misspelt answer or award half marks: marks scheme. Consider also EQUIVALENCE RELIABILITY: Are separate groups of students given the same amount of time? Is one group of students subject to unreasonable background noise during the Listening Test? Does the tone of the Teacher giving a dictation vary? Does one use "slow colloquial" while another dictates at "normal speed"? Also consider "Affect" - Do the students feel comfortable in the surroundings of the examination hall and with a particular invigilator? If seen two at a time in an oral test, do they get on with their partner? Does one candidate dominate, not giving the more nervous candidate a chance to show what they know?

4. Content validity: Accuracy of specification ensures this. Make sure you are not testing Reading Comprehension by providing a lot of text in the rubric of a test designed to assess Listening Comprehension. This could be most unfair to students from the Middle East who are not used to the Roman Alphabet, but who are good aural/oral communicators. In a Listening Comprehension Test, make sure you are not merely testing PHONETICS, by providing a Multiple Choice gap fill format where students fill in the word they hear. The skill of Listening operates on several linguistic and paralinguistic levels.

If your CONTENT SPECIFICATION consisted of outmoded tasks e.g. written translation between languages and sentence-level grammatical exercises, your test would not be very valid as a test of aural/oral communication. I took pride in my success in A-level French, until I got to Paris and found that I could not understand French speakers and that they could only understand me if I reverted to English.

Statistical formulae may show your test to be reliabe. The facility and discrimination indexes may look wonderful once these mathematical formulae have been applied, but if the CONTENT SPECIFICATION (WHAT IS BEING TESTED) is useless i.e. based on misconceptions about language or the useless rituals of an outmoded education system, then your test only has validity in terms of these misconceptions and useless rituals. We have placed many students from the far east in our lower level classes, even though they have been studying English in their own countries for over 12 years. By including listening and reading comprehension in placement tests, as opposed to sentence-level grammar, it is easy to place students who cannot use English for every day transactions in an English-speaking environment.

All the big examination boards (TOEFL, CAMBRIDGE, OXFORD etc.) have had to adapt the CONTENT SPECIFICATIONS of their tests over the years. Questions setters have had to adopt more of a task-based approach and to make the tasks which they ask students to perform more realistic in terms of English for communication. Changes in how we live (e.g. commerce over the internet via e-mail instead of by telephone, using IBM's ViaVoice instead of typing from a keyboard) affect the emphasis we should place on written versus aural/oral skills.