Language testing and methods of assessment

Methods of Assessment

- 24. What is test reliability? What are the aspects of reliability that are most important for the teacher/tester?
- 25. How can the reliability of a test be established? What can be done about the findings?
- 26. What is test validity? What are the most important aspects of validity for the teacher/tester?
- 27. How can the validity of a test be established? What can be done about the findings?

See detailed examples of my own attempts to calculate validity in relation to the following test-types:

- 24. What is test reliability? What are the aspects of reliability that are most important for the teacher/tester?

The reliability of a measuring device is high when any variations in readings taken represent true difference between the individuals being tested. Any other variation represents error.

The reliability of a test is its consistency: tape measure that stays the same length all the time as opposed to a piece of elastic. Same results should be obtained wherever the tape measure is used.

Comparison with different tests: i.e. whether the students takes one version of the test e.g. CFE, or another, the result should be the same.

THREE ASPECTS OF RELIABILITY 1) Circumstances in which the test is taken 2) The way in which it is marked 3) the uniformity of assessment it makes.

EXTRINSIC sources of error: i) Examiner variability ii) Variability of testing conditions.

INTRINSIC sources i) Lack of stability ii) Lack of equivalence.

- 25. How can the reliability of a test be established? What can be done about the findings?

Examiner variability virtually eliminated by objective formats.

Variability of testing conditions is reduced by meticulous care in providing instructions to test administrator & in formulating the explanations to the candidate (if necessary give some preliminary practice with rubrics etc so first timers are not handicapped

Stability reliability: a measuring device is stable if it gives the same result when used twice on the same object. Same or nearly the same distance between individuals in the group on both occasions, then high stability reliability.

Equivalence reliability: A measuring device is equivalent to another measuring device if both give the same results when applied to the same object. Construct two tests. To obtain estimates of equivalence reliability, do the following:

1. parallel versions & correlating the two sets of scores. Administer both versions to the same group of individuals 2. Split half. If 100 items, scores calculated separately for 2 sets of 50 items and results correlated. Take all odds and all evens.

Variance estimates: M = the mean; n = number of items in the test; s = Standard Deviation; r = reliability estimate. Kuder-Richardson formula 21 r= 1 - [M(n-M)] / n s squared Score f x

Equivalence Reliability for a certified achievement test should reach .7 (A lower score might be acceptable for a diagnostic test which is to be the basis of class discussion).

If test is designed so that spread of scores is not similar to that of a normal distribution (e.g. diagnostic... nearly all the items answered correctly but clear when not): DISTRIBUTION HISTOGRAM, MEAN, ITEM ANALYSIS showing which items were not answered correctly.

Inspection of scripts, class discussion of test results.

- 26. What is test validity? What are the most important aspects of validity for the teacher/tester?

When a test measures what it is intended to measure and nothing else, it is valid. Validity is the extent to which a test measures what it is intended to measure. Most important kinds of validity are CONTENT and FACE VALIDITY.

Content Validity - the test accurately reflects the syllabus on which it is based. The purposes of assessment/ Content specification list to ensure that the test reflects all areas to be assessed in suitable proportion.

A balanced sample without bias towards test items that are easiest to write or towards test material that happens to be available.

Face Validity - the test looks a good one: what teachers and students think of the test. Is it a reasonable way of assessing students? Trivial? Too difficult?

Use can be made of a formal questionnaire and informal discussion involving teachers & students.

Predictive validity - the test accurately predicts performance in some subsequent situation.

Concurrent validity - the test gives similar results to existing tests that have already been validated.

Constructive validity - the test reflects accurately the principles of a valid theory of foreign language learning!

- 27. How can the validity of a test be established? What can be done about the findings?

See the case studies in the sections on developing a placement test and a multiple choice test. Ultimately, the validity of your test design rests on its relationship with your own goals and objectives i.e. its success in measuring the behaviours you wish your learners to develop or the skills they need to further their own objectives. There is some consensus in societies about useful skills and socially responsible behaviour, though within most educational systems it is for test designers to define & / or take account of the aims and content of the programme of study which is the subject of the assessment.

HOME

TEACHING

NEXT