Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

Similar documents
Test Design and Item Analysis

Introduction. What We Will Cover in the Course. What We Will Cover Tonight. Ethics of testing. Test construction. Test evaluation. Types of tests.

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Psychology. 526 Psychology. Faculty and Offices. Degree Awarded. A.A. Degree: Psychology. Program Student Learning Outcomes

Psychology. Psychology 499. Degrees Awarded. A.A. Degree: Psychology. Faculty and Offices. Associate in Arts Degree: Psychology

REQUIREMENTS FOR MASTER OF SCIENCE DEGREE IN APPLIED PSYCHOLOGY CLINICAL/COUNSELING PSYCHOLOGY

Reliability. What We Will Cover. What Is It? An estimate of the consistency of a test score.

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

More About Regression

ONLINE SUPPLEMENT: CREATIVE INTERESTS AND PERSONALITY 1. Online Supplement

Interests Testing. Definition. What We Will Cover in This Section

EDUCATIONAL PSYCHOLOGY (ED PSY)

Relationships Between Quantitative Variables

Psychology PSY 312 BRAIN AND BEHAVIOR. (3)

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Clinical Counseling Psychology Courses Descriptions

PSYCHOLOGY (PSY) Psychology (PSY) 1

PROFESSORS: Bonnie B. Bowers (chair), George W. Ledger ASSOCIATE PROFESSORS: Richard L. Michalski (on leave short & spring terms), Tiffany A.

Master of Arts in Psychology Program The Faculty of Social and Behavioral Sciences offers the Master of Arts degree in Psychology.

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

On the Effects of Teacher s Sense of Humor on Iranian s EFL Learners Reading Comprehension Ability

Discussing some basic critique on Journal Impact Factors: revision of earlier comments

Music Genre Classification and Variance Comparison on Number of Genres

Hybrid resampling methods for confidence intervals: comment

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

UNIVERSITY OF SOUTH ALABAMA PSYCHOLOGY

F1000 recommendations as a new data source for research evaluation: A comparison with citations

The Bias-Variance Tradeoff

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

DOES MOVIE SOUNDTRACK MATTER? THE ROLE OF SOUNDTRACK IN PREDICTING MOVIE REVENUE

Diversity in Proof Appraisal

Modeling memory for melodies

INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN ICED 05 MELBOURNE, AUGUST 15-18, 2005 GENERAL DESIGN THEORY AND GENETIC EPISTEMOLOGY

A Study of Predict Sales Based on Random Forest Classification

Predicting the Importance of Current Papers

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

What Can Experimental Philosophy Do? David Chalmers

DEPARTMENT OF PSYCHOLOGY

Understanding FICON Channel Path Metrics

Comparing gifts to purchased materials: a usage study

PSYCHOLOGY (PSY) Psychology (PSY) 1

Modeling sound quality from psychoacoustic measures

Music Performance Panel: NICI / MMM Position Statement

Moral Judgment, Authoritarianism, and Ethnocentrism

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

SocioBrains THE INTEGRATED APPROACH TO THE STUDY OF ART

Introduction to Knowledge Systems

in the Howard County Public School System and Rocketship Education

THE EFFECT OF EXPERTISE IN EVALUATING EMOTIONS IN MUSIC

Literature Cite the textual evidence that most strongly supports an analysis of what the text says explicitly

Machine Learning: finding patterns

International Comparison on Operational Efficiency of Terrestrial TV Operators: Based on Bootstrapped DEA and Tobit Regression

Nearest-neighbor and Bilinear Resampling Factor Estimation to Detect Blockiness or Blurriness of an Image*

The Influence of Visual Metaphor Advertising Types on Recall and Attitude According to Congruity-Incongruity

Ferenc, Szani, László Pitlik, Anikó Balogh, Apertus Nonprofit Ltd.

Time Domain Simulations

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

PSYCHOLOGY. Introduction. Educational Objectives. Degree Programs. Departmental Honors. Additional Information. Prerequisites

When Do Vehicles of Similes Become Figurative? Gaze Patterns Show that Similes and Metaphors are Initially Processed Differently

Strategic use of call externalities for entry deterrence. The case of Polish mobile telephony market

Proceedings of Meetings on Acoustics

Individual differences in prediction: An investigation of the N400 in word-pair semantic priming

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

Grade 6. Paper MCA: items. Grade 6 Standard 1

Domains of Inquiry (An Instrumental Model) and the Theory of Evolution. American Scientific Affiliation, 21 July, 2012

Regression Model for Politeness Estimation Trained on Examples

Psychology. PSY 199 Special Topics in Psychology See All-University 199 course description.

Musical learning and cognitive performance

PSYCHOLOGY (PSY) Psychology (PSY) 1

Linear mixed models and when implied assumptions not appropriate

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Chapter 21. Margin of Error. Intervals. Asymmetric Boxes Interpretation Examples. Chapter 21. Margin of Error

HOW WE KNOW WHAT ISN T SO: COGNITIVE SCIENCE AND MIND TRAPS

Chapter 2 Christopher Alexander s Nature of Order

Can scientific impact be judged prospectively? A bibliometric test of Simonton s model of creative productivity

For these items, -1=opposed to my values, 0= neutral and 7=of supreme importance.

University Microfilms International tann Arbor, Michigan 48106

Mixed Models Lecture Notes By Dr. Hanford page 151 More Statistics& SAS Tutorial at Type 3 Tests of Fixed Effects

FOR IMMEDIATE RELEASE. Frequently Asked Questions (FAQs) The following Q&A was prepared by Posit Science. 1. What is Tinnitus?

Becoming an expert in the musical domain: It takes more than just practice

A Computational Model for Discriminating Music Performers

REPORT DOCUMENTATION PAGE

PART II METHODOLOGY: PROBABILITY AND UTILITY

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Psychology. Psychology 505. Program Student Learning Outcomes. Faculty and Offices. Degree Awarded

Logic and Philosophy of Science (LPS)

ICMPC14 PROCEEDINGS. JULY 5-9, 2016 Hyatt Regency Hotel San Francisco, California

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

The Effects of Web Site Aesthetics and Shopping Task on Consumer Online Purchasing Behavior

STUDY OF THE PERCEIVED QUALITY OF SAXOPHONE REEDS BY A PANEL OF MUSICIANS

Comparison, Categorization, and Metaphor Comprehension

Psychology. The Bachelor's Degree. Departmental Goals and Objectives. Admissions Requirements. Advising. Psychology 1

DEAD POETS PROPERTY THE COPYRIGHT ACT OF 1814 AND THE PRICE OF BOOKS

The Nature of Children's Singing Voices: Characteristics and Assessment

Modeling perceived relationships between melody, harmony, and key

SpringBoard Academic Vocabulary for Grades 10-11

A Pilot Study: Humor and Creativity

The Debate on Research in the Arts

Humor Styles as Mediators Between Self-Evaluative Standards and Psychological Well-Being

Transcription:

Validity 4/8/2003 PSY 721 Validity 1 What Is It? The degree to which an inference from a test score is appropriate or meaningful. A test may be valid for one application but invalid for an another. A test s validity is limited by its reliability. 4/8/2003 PSY 721 Validity 2 Types We Will Discuss 1. Face validity 2. Content validity 3. Criterion related validity - Concurrent - Predictive 4. Construct validity 4/8/2003 PSY 721 Validity 3 1

Type 1. Face Validity The extent to which a test looks like it measures what it says it measures. 4/8/2003 PSY 721 Validity 4 1. Superficial. Issues 2. Because it looks good doesn t mean it is good. 3. Because it looks weird doesn t mean it is weird. 4/8/2003 PSY 721 Validity 5 Type 2. Content Validity Showing that the behaviors sampled by the test are a representative sample of the attribute being measured. 4/8/2003 PSY 721 Validity 6 2

Content Domain to be assessed. Content Domain of the test. Basic concepts of reliability as they apply to test evaluation and interpretation of test scores. Individual test items. 4/8/2003 PSY 721 Validity 7 Model Domain Test Deficiency Contamination Relevance 4/8/2003 PSY 721 Validity 8 What Good Is It? Does the test cover a representative sample of the skills, abilities, knowledge, and/or behaviors relevant to the construct being measured? 4/8/2003 PSY 721 Validity 9 3

Concerns/Issues 1. Did the test items cover the Content Domain? 2. Did the test include items that were irrelevant to the content domain? 3. Were important aspects of the Content Domain missed by test items? 4. How to determine where good is? 4/8/2003 PSY 721 Validity 10 Types of prediction Clinical Actuarial Expert interpretation based on logical integration and interpretation of the test data. Statistical assessment using some empirically derived mathematical formula. 4/8/2003 PSY 721 Validity 11 Type 3. Criterion Related Validity Criterion Predictor A standard or measure of the accuracy of a decision or behavioral prediction. An assessment tool used to estimate a person s behavior. Validity Coefficient The correlation between test scores (predictor) and the criterion. 4/8/2003 PSY 721 Validity 12 4

Performance (Criterion) 120 110 100 90 80 70 60 50 40 30 20 10 0 8 10 12 14 16 18 20 22 24 26 28 30 Selection Test (Predictor) 4/8/2003 PSY 721 Validity 13 A. Predictive Validation 1. Test all applicants (predictor). 2. Hire all applicants. 3. Wait. 4. Collect criterion data. 5. Evaluate the relationship between the predictor and the criterion. 4/8/2003 PSY 721 Validity 14 B. Concurrent Validation 1. Get sample of incumbents. 2. Test sample (predictor). 3. Get performance data on sample (criterion). 4. Evaluate the relationship between the predictor and the criterion. 4/8/2003 PSY 721 Validity 15 5

Question? Which strategy is better and why? 4/8/2003 PSY 721 Validity 16 Comparison Predictive Uncontaminated Sample Positive Test Attitude Full Range of Scores Strong Statistics Takes Time Expensive Contaminated Sample Negative Test Attitude Restricted Range of Scores Weak Statistics Little Time Thrifty Concurrent 4/8/2003 PSY 721 Validity 17 Issues 1. Nature of the sample. 2. Changes over time. 3. Form of the relationship. 4. Is your criterion any good? 5. Standard error of estimate. 4/8/2003 PSY 721 Validity 18 6

Performance (Criterion) 120 110 100 90 80 70 60 50 40 30 20 10 0 8 10 12 14 16 18 20 22 24 26 28 30 Selection Test (Predictor) 4/8/2003 PSY 721 Validity 19 Standard Error of Estimate SE = SD 1 r 2 est y xy 4/8/2003 PSY 721 Validity 20 Influence of Increasing r on SE est (SD = 10) r r 2 2 SDy 1 rxy.90.80.70.60.81.64.49.36 4.35 6.0 7.1 8.0 4/8/2003 PSY 721 Validity 21 7

Performance 120 110 100 90 80 70 60 50 40 30 20 10 0 False Negatives True Negatives Cut Score 8 10 12 14 16 18 20 22 24 26 28 30 Selection Test Hits False Positives 4/8/2003 PSY 721 Validity 22 Figure 1. Comparison of predicted graduation rates 4.0 to actual graduation rates. 3.0 Count 2.0 1.0 Graduate 0.0.11.32.41.44.53.57.65.71.80.94.23.36.43.50.55.62.67.77.82 No Yes 4/8/2003 PSY 721 Validity 23 Predicted Value Combining Tests Test Battery Models Compensatory Multiple Cutoff Group of tests used to predict a single criterion. Strength in one area offsets weakness in another area. Minimal level required for one or more critical areas. 4/8/2003 PSY 721 Validity 24 8

Combining Tests, cont. Multiple Regression Optimal statistical combination of scores to predict a single criterion. 4/8/2003 PSY 721 Validity 25 Decision Impact Selection Placement Classification 4/8/2003 PSY 721 Validity 26 Type 4. Construct Validity Demonstration that the test is measuring the hypothetical construct or trait that one claims it is measuring. 4/8/2003 PSY 721 Validity 27 9

Evidence for Construct Validity 1. Homogeneity. Does the test score represent a single construct? 2. Relationships. Correlates with other tests in a way that is consistent with the predictions of the construct. 3. Age. Scores change as a function of maturation in a way that is consistent with the theory. 4. Intervention. Posttest scores change after intervention. 5. Groups. Scores from distinctly different groups vary. 4/8/2003 PSY 721 Validity 28 Decision Style Rational Intuitive Dependent Avoidant PI Assertiveness AVA Assertiveness PI-Sociability AVA Sociability PI-Calmness -.118 -.214 AVA Calmness -.367 -.410 PI Conformity.219 -.003 -.269 AVA Conformity.462.239 4/8/2003 PSY 721 Validity 29 Convergent vs. Discriminant Validity Convergent Validity Discriminant Validity Demonstrating that the test is related to other tests measuring the same thing. Demonstrating that the test is NOT related to tests with which it should NOT be related.. 4/8/2003 PSY 721 Validity 30 10

Developmental Changes Some constructs change as a function of age. Abilities. Intelligence. Cognitive skills. Issues. Not all change as a function of age. Cultural influences. 4/8/2003 PSY 721 Validity 31 Pretest Posttest Changes Issues. Experimental design. State vs. Trait. 4/8/2003 PSY 721 Validity 32 Distinct Groups Can the test differentiate between groups that are distinctly different on the construct? 4/8/2003 PSY 721 Validity 33 11

Factor Analysis Statistical techniques for identifying interrelationships between items with the goal of identifying items that group or cluster together. 4/8/2003 PSY 721 Validity 34 C G D B E H K A L F I J 4/8/2003 PSY 721 Validity 35 Test Bias Factors inherent in a test that systematically prevent accurate, impartial measurement of one group. 4/8/2003 PSY 721 Validity 36 12

Bias in Regression SLOPE BIAS 4/8/2003 PSY 721 Validity 37 Regression Bias, cont. INTERCEPT BIAS Underpredict Overpredict 4/8/2003 PSY 721 Validity 38 4/8/2003 PSY 721 Validity 39 13