Reliability 4/8/2003 PSY 721 Reliability 1 What We Will Cover What reliability is. How a test s reliability is estimated. How to interpret and use reliability estimates. How to enhance reliability. 4/8/2003 PSY 721 Reliability 2 What Is It? An estimate of the consistency of a test score. Permits an estimate of the amount of error in a score. The more error, the less stable (reliable) the score is 4/8/2003 PSY 721 Reliability 3 1
Definitions True Score The stable characteristics of the individual being tested or the attribute being measured. Error Features of the individual, test content, and situation that influence a score but which have nothing to do with the attribute being measured. 4/8/2003 PSY 721 Reliability 4 Sources of Variability #1 1. Person. True level of the trait or construct being measured. Variability in the person not connected with the trait (error). 2. Test. Content that is related to the trait being measured. Errors in content sampling (error). Errors in item construction (error). 4/8/2003 PSY 721 Reliability 5 Sources of Variability #2 3. Test administration. - Inconsistent test administration (error). 4. Scorer error. - One scorer is inconsistent (error). - Two scorers don t give the same assessment to the same behavior (error). 4/8/2003 PSY 721 Reliability 6 2
Classical Test Theory OBSERVED = TRUE + ERROR SCORE SCORE X = T + e 4/8/2003 PSY 721 Reliability 7 Types of Reliability Estimates 4/8/2003 PSY 721 Reliability 8 Test Retest Reliability Coefficient of Stability Administer Test A Wait Readminister Test A 4/8/2003 PSY 721 Reliability 9 3
Test Retest Reliability Scatterplot Time 2 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 Time 1 4/8/2003 PSY 721 Reliability 10 What good is it? What sources of error are detected with test-retest reliability? 4/8/2003 PSY 721 Reliability 11 Issues with r tt The interval between time 1 and time 2 is important. Subject reactivity. Carry over effect. Time consuming. Assumes no change in the individual. 4/8/2003 PSY 721 Reliability 12 4
Alternate Forms Reliability Administer Test A No significant wait Administer Test B 4/8/2003 PSY 721 Reliability 13 What good is it? What sources of error are detected with alternate forms reliability? 4/8/2003 PSY 721 Reliability 14 Issues Practice effect. Fatigue effect. Time delay. Back-to-back. Interval. 4/8/2003 PSY 721 Reliability 15 5
Sweeney s Measure of Verbal Fluency Use each of the following words correctly in a sentence. 1. Cat 2. House 3. Automobile 4. Phrenologize 5. Coat 6. Marble 7. Dog-flogger 8. Variance 9. Beetle 10.Crayon 4/8/2003 PSY 721 Reliability 16 3. Internal Consistency 1. Split half reliability. 2. Kuder Richardson (KR-20). 3. Coefficient Alpha. 4/8/2003 PSY 721 Reliability 17 Split Half Reliability 1. Divide the test into two sub-tests. 2. Correlate the scores on the subtests. 4/8/2003 PSY 721 Reliability 18 6
Issues With Split Half Which halves? Correcting for length. Speeded tests. Only a single analysis. 4/8/2003 PSY 721 Reliability 19 Kuder-Richardson (KR-20) Used with test items that can be scored pass-fail. Represents the mean of all possible split-half coefficients. Expressed in terms of a correlation coefficient. 4/8/2003 PSY 721 Reliability 20 Coefficient Alpha Used when There is no pass-fail. Multiple responses to an item. Represents the mean of all possible split-half coefficients. Expressed in terms of a correlation coefficient. 4/8/2003 PSY 721 Reliability 21 7
Extroversion Scale (α =.66) Quiet Shy Bold Item Energetic Bashful Withdrawn Talkative Extraverted Corrected Item-Total Correlation.238.508.451.541.327.263.307.251 Alpha if Deleted.657.587.603.580.644.655.643.664 4/8/2003 PSY 721 Reliability 22 What you can learn. Errors due to content sampling. Errors due to heterogeneity of the content domain. Scoring errors. 4/8/2003 PSY 721 Reliability 23 Interrater(Scorer) Reliability 1. Do different scorers give the same evaluation of the same test? 2. Does the same scorer give the same evaluation of the same test? 4/8/2003 PSY 721 Reliability 24 8
How to Interpret and Use Reliability Estimates 4/8/2003 PSY 721 Reliability 25 Interpretation of r tt Can be interpreted as the % of variance attributable to TRUE SCORE. r tt = percent of TRUE SCORE variability in a score. 1 - r tt = percent of ERROR variability in a score. 4/8/2003 PSY 721 Reliability 26 Mathematically Speaking σ 2 Total = σ 2 True Score + σ 2 Error r tt = σ 2 True Score σ 2 True Score + σ 2 Error 4/8/2003 PSY 721 Reliability 27 9
True Score 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 Obtained Score True Score Error 4/8/2003 PSY 721 Reliability 28 Standard Error of Measurement (SEM) An index of the amount of error (inconsistency) in an individual s test score. An estimate of the standard deviation of the error in a test. 4/8/2003 PSY 721 Reliability 29 How to Calculate SEM SEM = SD 1 r test tt 4/8/2003 PSY 721 Reliability 30 10
Example Calculation Mean = 50 SD = 4 r tt =.89 SEM = 4 1. 89 SEM = 4. 11 SEM = 4 x. 33 SEM = 132. 4/8/2003 PSY 721 Reliability 31 Confidence Intervals How to account for the fact that we never measure something exactly. 4/8/2003 PSY 721 Reliability 32 46.04 47.36 48.68 50 51.32 52.64 53.96 4/8/2003 PSY 721 Reliability 33 11
Confidence Intervals -1.64 90% +1.64-1.96 +1.96 95% 4/8/2003 PSY 721 Reliability 34 Magic Numbers Confidence Interval z-score limits +/- 1.64 90 th 95 th +/- 1.96 99 th +/- 2.58 4/8/2003 PSY 721 Reliability 35 Reliability of Difference Scores Issue Both tests have random error. The difference between the two test scores does not take into account the SEM for each test. The Standard Error of Difference (SE diff ) is the estimate of error in difference scores. SE diff is greater than either SEM. 4/8/2003 PSY 721 Reliability 36 12
SE diff Example PDQ Test of Conceptual Flexibility Mean 40 SD 8 r tt.90 SEdiff = SD 2 r11 r 22 SEdiff = 8. 2 SEdiff = 358. SEM 2.53 4/8/2003 PSY 721 Reliability 37 How to Enhance Reliability Increase test length. Remove inconsistent items. Correct for attenuation. Standardize the scoring system. Live with it. 4/8/2003 PSY 721 Reliability 38 THE END 4/8/2003 PSY 721 Reliability 39 13