Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Similar documents
Relationships Between Quantitative Variables

More About Regression

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Algebra I Module 2 Lessons 1 19

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

LESSON 1: WHAT IS BIVARIATE DATA?

DV: Liking Cartoon Comedy

AskDrCallahan Calculus 1 Teacher s Guide

Answers. Chapter 9 A Puzzle Time MUSSELS. 9.1 Practice A. Technology Connection. 9.1 Start Thinking! 9.1 Warm Up. 9.1 Start Thinking!

Draft last edited May 13, 2013 by Belinda Robertson

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Table of Contents. Introduction...v. About the CD-ROM...vi. Standards Correlations... vii. Ratios and Proportional Relationships...

MAT Practice (solutions) 1. Find an algebraic formula for a linear function that passes through the points ( 3, 7) and (6, 1).

What is Statistics? 13.1 What is Statistics? Statistics

How can you determine the amount of cardboard used to make a cereal box? List at least two different methods.

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

Section 2.1 How Do We Measure Speed?

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian

Visual Encoding Design

N12/5/MATSD/SP2/ENG/TZ0/XX. mathematical STUDIES. Wednesday 7 November 2012 (morning) 1 hour 30 minutes. instructions to candidates

Unit 7, Lesson 1: Exponent Review

THE OPERATION OF A CATHODE RAY TUBE

SEVENTH GRADE. Revised June Billings Public Schools Correlation and Pacing Guide Math - McDougal Littell Middle School Math 2004

Graphical Displays of Univariate Data

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

1. Point A on the graph below represents the distance and time that Cat traveled on her trip. Which of the following represents her average speed?

THE OPERATION OF A CATHODE RAY TUBE

Frequencies. Chapter 2. Descriptive statistics and charts

Do Television and Radio Destroy Social Capital? Evidence from Indonesian Villages Online Appendix Benjamin A. Olken February 27, 2009

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Chapter 7 Probability

Statistical Consulting Topics. RCBD with a covariate

Display Dilemma. Display Dilemma. 1 of 12. Copyright 2008, Exemplars, Inc. All rights reserved.

1-1 Variables and Expressions

Notes Unit 8: Dot Plots and Histograms

Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

EOC FINAL REVIEW Name Due Date

Tech Paper. HMI Display Readability During Sinusoidal Vibration

LCD and Plasma display technologies are promising solutions for large-format

in the Howard County Public School System and Rocketship Education

TI-Inspire manual 1. Real old version. This version works well but is not as convenient entering letter

Math Chapters 1 and 2 review

Algebra in Our World from the workshop Law and Algebra: SVU (Special Visual Unit) by Heidi Schuler-Jones

DOWNLOAD OR READ : YOU ARE TALLER THAN YOUR PROBLEMS PDF EBOOK EPUB MOBI

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Unit Four Answer Keys

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

I. Model. Q29a. I love the options at my fingertips today, watching videos on my phone, texting, and streaming films. Main Effect X1: Gender

Guide for Utilization Measurement and Management of Fleet Equipment NCHRP 13-05

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002

Release Year Prediction for Songs

Analysis of Film Revenues: Saturated and Limited Films Megan Gold

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

EXPLORING DISTRIBUTIONS

The Relationship Between Movie theater Attendance and Streaming Behavior. Survey Findings. December 2018

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

Distribution of Data and the Empirical Rule

MANOVA COM 631/731 Spring 2017 M. DANIELS. From Jeffres & Neuendorf (2015) Film and TV Usage National Survey

Statistics for Engineers

The Mathematics of Music and the Statistical Implications of Exposure to Music on High. Achieving Teens. Kelsey Mongeau

3. Population and Demography

Problem Points Score USE YOUR TIME WISELY USE CLOSEST DF AVAILABLE IN TABLE SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

CALIFORNIA STANDARDS TEST CSM00433 CSM01958 A B C CSM02216 A 583,000

The Relationship Between Movie Theatre Attendance and Streaming Behavior. Survey insights. April 24, 2018

THE NORTH CAROLINA 2008 SAT REPORT. The URL for the complete report:

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

THE NORTH CAROLINA 2012 SAT REPORT. The URL for the complete report:

FRAME SCORING BEEF CATTLE: WHY AND HOW. K.C. Olson and J.A. Walker. Department of Animal Science, South Dakota State University

Copyright 2013 Pearson Education, Inc.

Page I-ix / Lab Notebooks, Lab Reports, Graphs, Parts Per Thousand Information on Lab Notebooks, Lab Reports and Graphs

Ratios, Rates & Proportions Chapter Questions

Douglas D. Reynolds UNLV UNIVERSITY OF NEVADA LAS VEGAS CENTER FOR MECHANICAL & ENVIRONMENTAL SYSTEMS TECHNOLOGY

Does the number of users rating the movie accurately predict the average user rating?

University of Tennessee at Chattanooga Steady State and Step Response for Filter Wash Station ENGR 3280L By. Jonathan Cain. (Emily Stark, Jared Baker)

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

THE NORTH CAROLINA 2010 SAT REPORT. The URL for the complete report:

Unit 7, Lesson 1: Exponent Review

Bite Size Brownies. Designed by: Jonathan Thompson George Mason University, COMPLETE Math

MATH& 146 Lesson 11. Section 1.6 Categorical Data

E X P E R I M E N T 1

Dot Plots and Distributions

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

Mobile Math Teachers Circle The Return of the iclicker

Processes for the Intersection

Predicting the Importance of Current Papers

Histograms and Frequency Polygons are statistical graphs used to illustrate frequency distributions.

Special Article. Prior Publication Productivity, Grant Percentile Ranking, and Topic-Normalized Citation Impact of NHLBI Cardiovascular R01 Grants

Example the number 21 has the following pairs of squares and numbers that produce this sum.

Key Maths Facts to Memorise Question and Answer

Version : 1.0: klm. General Certificate of Secondary Education November Higher Unit 1. Final. Mark Scheme

Acoustic Echo Canceling: Echo Equality Index

Essential Question: How can you use transformations of a parent square root function to graph. Explore Graphing and Analyzing the Parent

North Carolina Standard Course of Study - Mathematics

Functions Modeling Change A Preparation for Calculus Third Edition

K-Pop Idol Industry Minhyung Lee

Transcription:

Relationships Chapter 5 Between Quantitative Variables Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Three Tools we will use Scatterplot, a two-dimensional graph of data values Correlation, a statistic that measures the strength and direction of a linear relationship Regression equation, an equation that describes the average relationship between a response and explanatory variable Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 2

5.1 Looking for Patterns with Scatterplots Questions to Ask about a Scatterplot What is the average pattern? Does it look like a straight line or is it curved? What is the direction of the pattern? How much do individual points vary from the average pattern? Are there any unusual data points? Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 3

Positive/Negative Association Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increase. Two variables have a negative association when the values of one variable tend to decrease as the values of the other variable increase. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 4

Example 5.1 Height and Handspan Data: Height (in.) Span (cm) 71 23.5 69 22.0 66 18.5 64 20.5 71 21.0 72 24.0 67 19.5 65 20.5 76 24.5 67 20.0 70 23.0 62 17.0 and so on, for n = 167 observations. Data shown are the first 12 observations of a data set that includes the heights (in inches) and fully stretched handspans (in centimeters) of 167 college students. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 5

Example 5.1 Height and Handspan Taller people tend to have greater handspan measurements than shorter people do. When two variables tend to increase together, we say that they have a positive association. The handspan and height measurements may have a linear relationship. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 6

Example 5.2 Driver Age and Maximum Legibility Distance of Highway Signs A research firm determined the maximum distance at which each of 30 drivers could read a newly designed sign. The 30 participants in the study ranged in age from 18 to 82 years old. We want to examine the relationship between age and the sign legibility distance. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 7

Example 5.2 Driver Age and Maximum Legibility Distance of Highway Signs We see a negative association with a linear pattern. We will use a straight-line equation to model this relationship. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 8

Example 5.3 The Development of Musical Preferences The 108 participants in the study ranged in age from 16 to 86 years old. We want to examine the relationship between song-specific age (age in the year the song was popular) and musical preference (positive score => above average, negative score => below average). Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 9

Example 5.3 The Development of Musical Preferences Popular music preferences acquired in late adolescence and early adulthood. The association is nonlinear. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 10

Groups and Outliers Use different plotting symbols or colors to represent different subgroups. Look for outliers: points that have an usual combination of data values. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 11

5.2 Describing Linear Patterns with a Regression Line When the best equation for describing the relationship between x and y is a straight line, the equation is called the regression line. Two purposes of the regression line: to estimate the average value of y at any specified value of x to predict the value of y for an individual, given that individual s x value Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 12

Example 5.1 Height and Handspan (cont) Regression equation: Handspan = -3 + 0.35 Height Estimate the average handspan for people 60 inches tall: Average handspan = -3 + 0.35(60) = 18 cm. Predict the handspan for someone who is 60 inches tall: Predicted handspan = -3 + 0.35(60) = 18 cm. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 13

Example 5.1 Height and Handspan (cont) Regression equation: Handspan = -3 + 0.35 Height Slope = 0.35 => Handspan increases by 0.35 cm, on average, for each increase of 1 inch in height. In a statistical relationship, there is variation from the average pattern. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 14

The Equation for the Regression Line ŷ yˆ b b x 0 1 is spoken as y-hat, and it is also referred to either as predicted y or estimated y. b 0 is the intercept of the straight line. The intercept is the value of y when x = 0. b 1 is the slope of the straight line. The slope tells us how much of an increase (or decrease) there is for the y variable when the x variable increases by one unit. The sign of the slope tells us whether y increases or decreases when x increases. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 15

Example 5.2 Driver Age and Maximum Legibility Distance of Highway Signs (cont) Regression equation: Distance = 577-3 Age Slope of 3 tells us that, on average, the legibility distance decreases 3 feet when age increases by one year Estimate the average distance for 20-year-old drivers: Average distance = 577 3(20) = 517 ft. Predict the legibility distance for a 20-year-old driver: Predicted distance = 577 3(20) = 517 ft. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 16

Extrapolation Usually a bad idea to use a regression equation to predict values far outside the range where the original data fell. No guarantee that the relationship will continue beyond the range for which we have observed data. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 17

Prediction Errors and Residuals Prediction Error = difference between the observed value of y and the predicted value ŷ. Residual = y yˆ Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 18

Example 5.2 Driver Age and Maximum Legibility Distance of Highway Signs (cont) ŷ Regression equation: = 577 3x x = Age y = Distance yˆ 577 3x Residual 18 510 577 3(18)=523 510 523 = -13 20 590 577 3(20)=517 590 517 = 73 22 516 577 3(22)=511 516 511 = 5 Can compute the residual for all 30 observations. Positive residual => observed value higher than predicted. Negative residual => observed value lower than predicted. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 19

Least Squares Line and Formulas Least Squares Regression Line: minimizes the sum of squared prediction errors. SSE = Sum of squared prediction errors. Formulas for Slope and Intercept: x x y y i b1 2 i b0 y b1 i x i x x i Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 20

5.3 Measuring Strength and Direction with Correlation Correlation r indicates the strength and the direction of a straight-line relationship. The strength of the relationship is determined by the closeness of the points to a straight line. The direction is determined by whether one variable generally increases or generally decreases when the other variable increases. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 21

Interpretation of r and a Formula r is always between 1 and +1 magnitude indicates the strength r = 1 or +1 indicates a perfect linear relationship sign indicates the direction r = 0 indicates a slope of 0 so knowing x does not change the predicted value of y Formula for correlation: r 1 n 1 x i s x x y i s y y Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 22

Example 5.1 Height and Handspan (cont) Regression equation: Handspan = -3 + 0.35 Height Correlation r = +0.74 => a somewhat strong positive linear relationship. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 23

Example 5.2 Driver Age and Maximum Legibility Distance of Highway Signs (cont) Regression equation: Distance = 577-3 Age Correlation r = -0.8 => a somewhat strong negative linear association. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 24

Example 5.6 Left and Right Handspans If you know the span of a person s right hand, can you accurately predict his/her left handspan? Correlation r = +0.95 => a very strong positive linear relationship. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 25

Example 5.7 Verbal SAT and GPA Grade point averages (GPAs) and verbal SAT scores for a sample of 100 university students. Correlation r = 0.485 => a moderately strong positive linear relationship. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 26

Example 5.8 Age and Hours of TV Viewing Relationship between age and hours of daily television viewing for 1913 survey respondents. Correlation r = 0.12 => a weak connection. Note: a few claimed to watch more than 20 hours/day! Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 27

Example 5.9 Hours of Sleep and Hours of Study Relationship between reported hours of sleep the previous 24 hours and the reported hours of study during the same period for a sample of 116 college students. Correlation r = 0.36 => a not too strong negative association. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 28

Interpretation of r 2 and a formula Squared correlation r 2 is between 0 and 1 and indicates the proportion of variation in the response explained by x. SSTO = sum of squares total = sum of squared differences between observed y values and y. SSE = sum of squared errors (residuals) = sum of squared differences between observed y values and predicted values based on least squares line. r 2 SSTO SSE SSTO Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 29

Interpretation of r 2 Example 5.6: Left and Right Handspans r 2 = 0.90 => span of one hand is very predictable from span of other hand. Example 5.8: TV viewing and Age r 2 = 0.014 => only about 1.4% knowing a person s age doesn t help much in predicting amount of daily TV viewing. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 30

Example 5.6: Left and Right Handspans Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 31

5.4 Why the Answers May Not Make Sense Allowing outliers to overly influence the results Combining groups inappropriately Using correlation and a straight-line equation to describe curvilinear data Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 32

Example 5.4 Height and Foot Length (cont) Three outliers were data entry errors. Regression equation uncorrected data: corrected data: 15.4 + 0.13 height -3.2 + 0.42 height Correlation uncorrected data: r = 0.28 corrected data: r = 0.69 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 33

Example 5.10 Earthquakes in US San Francisco earthquake of 1906. Correlation all data: r = 0.73 w/o SF: r = 0.96 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 34

Example 5.11 Height and Lead Feet Scatterplot of all data: College student heights and responses to the question What is the fastest you have ever driven a car? Scatterplot by gender: Combining two groups led to illegitimate correlation Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 35

Example 5.12 Don t Predict without a Plot Population of US (in millions) for each census year between 1790 and 1990. Correlation: r = 0.96 Regression Line: population = 2218 + 1.218(Year) Poor Prediction for Year 2030 = 2218 + 1.218(2030) or about 255 million, only 6 million more than 1990. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 36

5.5 Correlation Does Not Prove Causation Interpretations of an Observed Association 1. Causation 2. Confounding Factors Present 3. Explanatory and Response are both affected by other variables 4. Response variable is causing a change in the explanatory variable Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 37

Case Study 5.1 A Weighty Issue Relationship between Actual and Ideal Weight Females Males Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 38