Designing a placement test

Getting started & types of test to avoid

ESSAY TITLE: Even though a great many language teaching institutions use placement tests to group students the classes that result are often sadly heterogenous. Suggest a reason for this and indicate how you would tackle this problem if expense were no object.


1. Write a content specification for each band on the incline of difficulty.

2. Write a first draft of the test. Try it out on colleagues and then on students at different levels and from different backgrounds. Edit your first draft between trials.

3. Develop a marking scheme. Discuss marker reliability, the scoring system, acceptable and unacceptable variants with other teachers.

Change items in the test, which give rise to contention or too many variants. At the lower end of the incline it is relatively easy to set items which have only one possible answer. Test your level bands, by getting students whose levels have already been determined to take your test. Their levels may have been determined by Public Examinations such as PET or FCE, but if their original placement test has not been validated it would also be wise to seek the opinion of people who have been teaching them as to whether they are typical level X students.

4. Collect data from a large sample of students (all the normal level bands & various backgrounds) and perform an item analysis to check that you obtain suitable facility & discrimination indices depending on the position of your items on your incline of difficulty. You can be a little flexible about items which are not successively more difficult, providing each of the level bands are adequately represented.


Although we often refer loosely to FILL-IN TESTS as CLOZE TESTS, selective deletion tests are not. Cloze tests, True Cloze Tests, where every Nth word of an authentic text is deleted are very poor instruments for placing students at different levels.

The reason for setting a p1acement test on an incline of difficulty is to give students at EVERY level a chance to show how much English they have learnt. Authentic texts(which generally serve higher levels) discriminate very poorly at the lower end of the scale. Everybody below the level of the text can be shown to have learnt little.

Selecting an authentic text of average difficulty can mean that students who are "just better than average" come out as "advanced". ONE test set on an incline of difficulty is also preferable to TWO tests. Having two different gauges complicates the process of comparing results and creates the unwanted dilemma of which test to offer borderline cases.

Selective deletion of items gives the test writer control over the specification of what, is to be assessed. Indeed, the test writer should be sensitive to what students are expected to achieve at the levels available to them. To choose not to represent our programmes of study in the test by randomly selecting the test items is absurd. It is as absurd as expecting a doctor to perform a diagnosis with a screw-driver and an egg-timer as the test instruments.