More About Regression

Similar documents
Relationships Between Quantitative Variables

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Algebra I Module 2 Lessons 1 19

DV: Liking Cartoon Comedy

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Statistical Consulting Topics. RCBD with a covariate

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian

Draft last edited May 13, 2013 by Belinda Robertson

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Problem Points Score USE YOUR TIME WISELY USE CLOSEST DF AVAILABLE IN TABLE SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

GLM Example: One-Way Analysis of Covariance

TWO-FACTOR ANOVA Kim Neuendorf 4/9/18 COM 631/731 I. MODEL

Unit 7, Lesson 1: Exponent Review

I. Model. Q29a. I love the options at my fingertips today, watching videos on my phone, texting, and streaming films. Main Effect X1: Gender

Resampling Statistics. Conventional Statistics. Resampling Statistics

hprints , version 1-1 Oct 2008

LESSON 1: WHAT IS BIVARIATE DATA?

in the Howard County Public School System and Rocketship Education

Visual Encoding Design

Supplementary Figures Supplementary Figure 1 Comparison of among-replicate variance in invasion dynamics

AskDrCallahan Calculus 1 Teacher s Guide

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

Table of Contents. Introduction...v. About the CD-ROM...vi. Standards Correlations... vii. Ratios and Proportional Relationships...

Subject-specific observed profiles of change from baseline vs week trt=10000u

STAT 250: Introduction to Biostatistics LAB 6

N12/5/MATSD/SP2/ENG/TZ0/XX. mathematical STUDIES. Wednesday 7 November 2012 (morning) 1 hour 30 minutes. instructions to candidates

Release Year Prediction for Songs

MANOVA COM 631/731 Spring 2017 M. DANIELS. From Jeffres & Neuendorf (2015) Film and TV Usage National Survey

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

MID-TERM EXAMINATION IN DATA MODELS AND DECISION MAKING 22:960:575

STAT 503 Case Study: Supervised classification of music clips

THE RELATIONSHIP OF BURR HEIGHT AND BLANKING FORCE WITH CLEARANCE IN THE BLANKING PROCESS OF AA5754 ALUMINIUM ALLOY

DOES MOVIE SOUNDTRACK MATTER? THE ROLE OF SOUNDTRACK IN PREDICTING MOVIE REVENUE

THE OPERATION OF A CATHODE RAY TUBE

What is Statistics? 13.1 What is Statistics? Statistics

The following content is provided under a Creative Commons license. Your support

MITOCW watch?v=vifkgfl1cn8

MAT Practice (solutions) 1. Find an algebraic formula for a linear function that passes through the points ( 3, 7) and (6, 1).

Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

Lecture 10: Release the Kraken!

How can you determine the amount of cardboard used to make a cereal box? List at least two different methods.

THE OPERATION OF A CATHODE RAY TUBE

TI-Inspire manual 1. Real old version. This version works well but is not as convenient entering letter

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

MANOVA/MANCOVA Paul and Kaila

Mixed Models Lecture Notes By Dr. Hanford page 151 More Statistics& SAS Tutorial at Type 3 Tests of Fixed Effects

Does the number of users rating the movie accurately predict the average user rating?

Predicting the Importance of Current Papers

AGAINST ALL ODDS EPISODE 22 SAMPLING DISTRIBUTIONS TRANSCRIPT

Latin Square Design. Design of Experiments - Montgomery Section 4-2

Replicated Latin Square and Crossover Designs

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

Placement Rent Exponent Calculation Methods, Temporal Behaviour, and FPGA Architecture Evaluation. Joachim Pistorius and Mike Hutton

Moving on from MSTAT. March The University of Reading Statistical Services Centre Biometrics Advisory and Support Service to DFID

Learning Musicianship for Automatic Accompaniment

K-Pop Idol Industry Minhyung Lee

F1000 recommendations as a new data source for research evaluation: A comparison with citations

(Week 13) A05. Data Analysis Methods for CRM. Electronic Commerce Marketing

Answers. Chapter 9 A Puzzle Time MUSSELS. 9.1 Practice A. Technology Connection. 9.1 Start Thinking! 9.1 Warm Up. 9.1 Start Thinking!

Analysis of Film Revenues: Saturated and Limited Films Megan Gold

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Abstract. Keywords Movie theaters, home viewing technology, audiences, uses and gratifications, planned behavior, theatrical distribution

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

Guide for Utilization Measurement and Management of Fleet Equipment NCHRP 13-05

MOZART S PIANO SONATAS AND THE THE GOLDEN RATIO. The Relationship Between Mozart s Piano Sonatas and the Golden Ratio. Angela Zhao

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

Linear mixed models and when implied assumptions not appropriate

Noise. CHEM 411L Instrumental Analysis Laboratory Revision 2.0

Can scientific impact be judged prospectively? A bibliometric test of Simonton s model of creative productivity

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

Characterization and improvement of unpatterned wafer defect review on SEMs

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

Key Maths Facts to Memorise Question and Answer

Do Television and Radio Destroy Social Capital? Evidence from Indonesian Villages Online Appendix Benjamin A. Olken February 27, 2009

Copyright 2013 Pearson Education, Inc.

Page I-ix / Lab Notebooks, Lab Reports, Graphs, Parts Per Thousand Information on Lab Notebooks, Lab Reports and Graphs

Open Access Determinants and the Effect on Article Performance

Homework Packet Week #5 All problems with answers or work are examples.

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

Practice makes less imperfect: the effects of experience and practice on the kinetics and coordination of flutists' fingers

THE FAIR MARKET VALUE

SEVENTH GRADE. Revised June Billings Public Schools Correlation and Pacing Guide Math - McDougal Littell Middle School Math 2004

Unit 7, Lesson 1: Exponent Review

Modeling memory for melodies

Mobile Math Teachers Circle The Return of the iclicker

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

CONCLUSION The annual increase for optical scanner cost may be due partly to inflation and partly to special demands by the State.

Modeling television viewership

Overview. Teacher s Manual and reproductions of student worksheets to support the following lesson objective:

Best Pat-Tricks on Model Diagnostics What are they? Why use them? What good do they do?

Aesthetic issues in spatial composition: effects of position and direction on framing single objects

Analysis of WFS Measurements from first half of 2004

When Do Vehicles of Similes Become Figurative? Gaze Patterns Show that Similes and Metaphors are Initially Processed Differently

1.1 Common Graphs and Data Plots

Transcription:

Regression Line for the Sample Chapter 14 More About Regression is spoken as y-hat, and it is also referred to either as predicted y or estimated y. b 0 is the intercept of the straight line. The intercept is the value of y when x = 0. b 1 is the slope of the straight line. The slope tells us how much of an increase (or decrease) there is for the y variable when the x variable increases by one unit. The sign of the slope tells us whether y increases or decreases when x increases. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 2 Deviations from the Regression Line in the Sample For an observation y i in the sample, the residual is: = value of response variable for i th obs., where x i is the value of the explanatory variable for the i th observation. Example 14.1 Height and Handspan Data: Heights (in inches) and Handspans (in centimeters) of 167 college students. Regression equation: Handspan = -3 + 0.35 Height Slope = 0.35 => Handspan increases by 0.35 cm, on average, for each increase of 1 inch in height. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 3 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 4 Example 14.1 Height and Handspan (cont) Consider a person 70 inches tall whose handspan is 23 centimeters. The sample regression line is so The residual = observed y predicted y = 23 21.5 = 1.5 cm. cm for this person. Proportion of Variation Explained Squared correlation r 2 is between 0 and 1 and indicates the proportion of variation in the response explained by x. SSTO = sum of squares total = sum of squared differences between observed y values and. SSE = sum of squared errors (residuals) = sum of squared differences between observed y values and predicted values based on least squares line. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 5 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 6 1

Making Inferences 1. Does the observed relationship also occur in the population? 2. For a linear relationship, what is the slope of the regression line in the population? 3. What is the mean value of the response variable (y) for individuals with a specific value of the explanatory variable (x)? 14.1 Sample and Population Regression Models If the sample represents a larger population, we need to distinguish between the regression line for the sample and the regression line for the population. The observed data can be used to determine the regression line for the sample, but the regression line for the population can only be imagined. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 7 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 8 Regression Line for the Population β 0 + β 1 x This is the average response for individuals in the population who all have the same x. β 0 is the intercept of the straight line in the population. β 1 is the slope of the straight line in the population. Note that if the population slope were 0, there is no linear relationship in the population. These population parameters are estimated using the corresponding statistics. Example 14.2 Height and Weight (cont) Data: x = heights (in inches) y = weight (pounds) of n = 43 male students. R-Sq = 32.3% => The variable height explains 32.3% of the variation in the weights of college men. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 9 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 10 Example 14.3 Driver Age and Maximum Legibility Distance of Highway Signs Study to examine relationship between age and maximum distance at which drivers can read a newly designed sign. Example 14.3 Age and Distance (cont) s = 49.76 and R-sq = 64.2% => Average distance from regression line is about 50 feet, and 64.2% of the variation in sign reading distances is explained by age. SSE = 69334 SSTO = 193667 Average Distance = 577 3.01 Age Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 11 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 12 2

14.3 Inference About Linear Regression Relationship The statistical significance of a linear relationship can be evaluated by testing whether or not the slope is 0. Test for Zero Slope t = b 1 0 s.e. ( b 1 ) H a : β 1 0 (the population slope is not 0, so y and x are linearly related.) Alternative may be one-sided or two-sided. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 13 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 14 Example 14.3 Age and Distance (cont) H 0 : β 1 = 0 (y and x are not linearly related.) H a : β 1 0 (y and x are linearly related.) Another example and p-value 0.000 Probability is virtually 0 that observed slope could be as far from 0 or farther if there is no linear relationship in population => Appears the relationship in the sample represents a real relationship in the population. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 15 p-value = 0.292 for testing that the slope is 0. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 16 Example 14.2 Height and Weight (cont) Data: x = heights (in inches) y = weight (pounds) of n = 43 male students. R-Sq = 32.3% => The variable height explains 32.3% of the variation in the weights of college men. Effect of Sample Size on Significance With very large sample sizes, weak relationships with low correlation values can be statistically significant. Moral: With a large sample size, saying two variables are significantly related may only mean the correlation is not precisely 0. We should carefully examine the observed strength of the relationship, the value of r. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 17 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 18 3

Other alternative hypotheses: H a : β 1 0 (the population slope is not 0, so y and x are linearly related.) H a : β 1 > 0 (y and x are postively linearly related.) H a : β 1 < 0 (y and x are negatively linearly related.) Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 19 Other alternative hypotheses: The p-value for a one-sided alternative is 1. (reported p/2) if b 1 and H a match in sign 2. 1-(reported p/2) if b 1 and H a don t match in sign The form of the hypothesis comes from the research question! 20 Another example 14.6 Checking Conditions for Regression Inference p-value = 0.292 for testing that the slope is 0. 21 Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 22 Checking Conditions with Plots Conditions checked using two plots: Scatterplot of y versus x for the sample Scatterplot of the residuals versus x for the sample 1. a). Plot of y versus x should show points randomly scattered around an imaginary straight line. b). Plot of residuals versus x should show points randomly scattered around a horizontal line at residual 0 2. Extreme outliers should not be evident in either plot. 3. Neither plot should show increasing or decreasing spread in the points as x increases. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 23 Example 14.2 Height and Weight Scatterplot: straight line model seems reasonable Residual plot: Is a somewhat randomlooking blob of points => linear model ok. Both plots: no extreme outliers and approximately same variance across the range of heights. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 24 4

When Conditions Are Not Met Condition 1 not met: use a more complicated model When Conditions Are Not Met Condition 2 not met: if outlier(s), correction depends on the reason for the outlier(s). Based on this residual plot, a curvilinear model, such as the quadratic model, may be more appropriate. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 25 Outlier is legitimate. Relationship appears to change for body weights over 210 pounds. Could remove outlier and use the linear regression relationship only for body weights under about 210 pounds. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc. 26 5