Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Similar documents
Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian

More About Regression

Resampling Statistics. Conventional Statistics. Resampling Statistics

DV: Liking Cartoon Comedy

Relationships Between Quantitative Variables

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Algebra I Module 2 Lessons 1 19

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian

Analysis of Seabright study on demand for Sky s pay TV services. Annex 7 to pay TV phase three document

Lecture 10: Release the Kraken!

Best Pat-Tricks on Model Diagnostics What are they? Why use them? What good do they do?

Problem Points Score USE YOUR TIME WISELY USE CLOSEST DF AVAILABLE IN TABLE SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Normalization Methods for Two-Color Microarray Data

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

CONCLUSION The annual increase for optical scanner cost may be due partly to inflation and partly to special demands by the State.

Visual Encoding Design

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

Chapter 21. Margin of Error. Intervals. Asymmetric Boxes Interpretation Examples. Chapter 21. Margin of Error

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Does the number of users rating the movie accurately predict the average user rating?

What is Statistics? 13.1 What is Statistics? Statistics

Fundamentals and applications of resampling methods for the analysis of speech production and perception data.

COMP Test on Psychology 320 Check on Mastery of Prerequisites

MID-TERM EXAMINATION IN DATA MODELS AND DECISION MAKING 22:960:575

Sector sampling. Nick Smith, Kim Iles and Kurt Raynor

AGAINST ALL ODDS EPISODE 22 SAMPLING DISTRIBUTIONS TRANSCRIPT

NETFLIX MOVIE RATING ANALYSIS

Linear mixed models and when implied assumptions not appropriate

Predicting the Importance of Current Papers

Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

Moving on from MSTAT. March The University of Reading Statistical Services Centre Biometrics Advisory and Support Service to DFID

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

E X P E R I M E N T 1

TWO-FACTOR ANOVA Kim Neuendorf 4/9/18 COM 631/731 I. MODEL

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

Modeling memory for melodies

Modeling television viewership

Example the number 21 has the following pairs of squares and numbers that produce this sum.

GLM Example: One-Way Analysis of Covariance

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Tutorial 0: Uncertainty in Power and Sample Size Estimation. Acknowledgements:

Hybrid resampling methods for confidence intervals: comment

Chapter 6. Normal Distributions

Variation in fibre diameter profile characteristics between wool staples in Merino sheep

Analysis of Film Revenues: Saturated and Limited Films Megan Gold

Use black ink or black ball-point pen. Pencil should only be used for drawing. *

Replicated Latin Square and Crossover Designs

in the Howard County Public School System and Rocketship Education

Time Domain Simulations

Paired plot designs experience and recommendations for in field product evaluation at Syngenta

Agilent Feature Extraction Software (v10.7)

Mixed Models Lecture Notes By Dr. Hanford page 151 More Statistics& SAS Tutorial at Type 3 Tests of Fixed Effects

Distribution of Data and the Empirical Rule

On Figure of Merit in PAM4 Optical Transmitter Evaluation, Particularly TDECQ

RANDOMIZED COMPLETE BLOCK DESIGN (RCBD) Probably the most used and useful of the experimental designs.

Measurement User Guide

UC San Diego UC San Diego Previously Published Works

Chapter 1 Midterm Review

THE USE OF RESAMPLING FOR ESTIMATING CONTROL CHART LIMITS

Reliability. What We Will Cover. What Is It? An estimate of the consistency of a test score.

MANOVA/MANCOVA Paul and Kaila

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

CS229 Project Report Polyphonic Piano Transcription

Special Article. Prior Publication Productivity, Grant Percentile Ranking, and Topic-Normalized Citation Impact of NHLBI Cardiovascular R01 Grants

STAT 503 Case Study: Supervised classification of music clips

Reproducibility Assessment of Independent Component Analysis of Expression Ratios from DNA microarrays.

Measurement of overtone frequencies of a toy piano and perception of its pitch

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

1 Introduction Steganography and Steganalysis as Empirical Sciences Objective and Approach Outline... 4

Supplementary Figures Supplementary Figure 1 Comparison of among-replicate variance in invasion dynamics

MAT Practice (solutions) 1. Find an algebraic formula for a linear function that passes through the points ( 3, 7) and (6, 1).

BitWise (V2.1 and later) includes features for determining AP240 settings and measuring the Single Ion Area.

User s Manual. Log Scale (/LG) GX10/GX20/GP10/GP20/GM10 IM 04L51B01-06EN. 2nd Edition

m RSC Chromatographie Integration Methods Second Edition CHROMATOGRAPHY MONOGRAPHS Norman Dyson Dyson Instruments Ltd., UK

Why visualize data? Advanced GDA and Software: Multivariate approaches, Interactive Graphics, Mondrian, iplots and R. German Bundestagswahl 2005

The following content is provided under a Creative Commons license. Your support

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

hprints , version 1-1 Oct 2008

I. Model. Q29a. I love the options at my fingertips today, watching videos on my phone, texting, and streaming films. Main Effect X1: Gender

Precision testing methods of Event Timer A032-ET

F1000 recommendations as a new data source for research evaluation: A comparison with citations

Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

THE RELATIONSHIP OF BURR HEIGHT AND BLANKING FORCE WITH CLEARANCE IN THE BLANKING PROCESS OF AA5754 ALUMINIUM ALLOY

The Measurement Tools and What They Do

EXPLORING DISTRIBUTIONS

Chapter 3. Averages and Variation

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

University of Tennessee at Chattanooga Steady State and Step Response for Filter Wash Station ENGR 3280L By. Jonathan Cain. (Emily Stark, Jared Baker)

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Processes for the Intersection

Latin Square Design. Design of Experiments - Montgomery Section 4-2

Transcription:

ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 3 Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions? Getting class notes from the web Go to my web page www-stat.wharton.upenn.edu/~stine/mich Lecture notes are PDF files (Adobe Acrobat). Updated daily (usually sometime after class) and will remain on the web for some time. Software Script files for R commands. Try software while you are here. Yet more Summer Program t-shirts

ICPSR 2003 2 Calibration Overview Powerful idea of using the bootstrap to check itself. Resampling a correlation Correlation requires special methods Its sampling distribution depends on the unknown population correlation. Bootstrap does as well as special methods. Simple regression Model and assumptions - Leverage, influence, diagnostics - Animated simple regression - Smoothing Resampling in regression Two methods of resampling - Residual resampling (fixed X) - Observation resampling (random X) Picking a method of resampling

ICPSR 2003 3 Inference for a Correlation Classic bootstrap illustration Efron s law school data LSAT and GPA values for 15 law schools 500 550 600 650 700 lsat How to make an inference for the correlation? - What is the confidence interval? - What is the population anyhow? The sample correlation r = 0.776 New type of complexity SE of average does not depend on m, but SE of sample correlation depends on r.

ICPSR 2003 4 Classical Inference for the Correlation Fisher s z transform The sample correlation is not normal, but Fisher s z-transform gives a statistic that is close to normal z = f (r) = 1 2 log 1 + r 1 - r This stat is roughly normal with mean f(r) and SD = 1 n - 3 Example with the law school data Fisher s z transformation gives for the 90% confidence interval the range [0.507, 0.907] = [.776-.269,.776+.131] Fisher s interval is not of the usual form [estimate ± 2 SE of estimate] but instead is very asymmetric. Why should the interval be asymmetric?

ICPSR 2003 5 How to resample? Bootstrapping the Correlation Keep the data paired resample observations (What happens if you do not keep the pairing?) Same basic resampling iteration - Collect B bootstrap replications - Repeatedly calculate the correlation for a large number of bootstrap samples Raw calculations, one last time Explore choice of # of bootstrap replications Procedure - Start with 50 - Add further bootstrap replications - Compare the results as they accumulate Observe that - SE settles down quickly but - Lower limit of the CI is not stable until we have a large number of replications.

ICPSR 2003 6 Correlation Results Plot the bootstrap distribution The bootstrap distribution is skewed - clearly not normal - has hard upper limit at 1 - foolish to use interval like r ± 2 SE(r) Note: Fisher s transformation accommodates this special kind of asymmetry; the range of Fisher s z transform is not bounded. Comparison of intervals 0 0.2 0.4 0.6 0.8 1 With 3000 replications: 90% bootstrap interval [0.520, 0.943] = [.776 -.220,.776+.167] Fisher s interval [0.507, 0.907] = [.776 -.269,.776+.131] Both are skewed and within [-1,1] limits. The bootstrap works without knowing Fisher s special transformation or assuming normality.

ICPSR 2003 7 Exploring the Bootstrap Distribution Resampled correlations are not normal Kernel density estimates These alternatives to histograms avoid binning the data, but require you to choose how much to smooth the data. You can explore these options using a slider. Quantile plots Shows how close to normality, focusing on the extremes rather than the center of the data. Quantile plot of CORR_B -0.099 0.174 0.446 0.719 0.992 Data Scale

ICPSR 2003 8 Simple Regression Model Assumptions for one-predictor model Y = b0 + b1 X + e 1. Independent observations 2. Equal variance E (e) = 0, Var(e)=s 2 3. Normally distributed error terms + X is fixed OR perfectly measured, ind of e Data generating process The hot dog model Least squares Pick the line with the smallest sum of squared vertical deviations (residuals). Least squares estimator (OLS) is best : What does best mean in this context? Issues Linear? Is a line a good summary? Really want ave(y X) Outliers? What effects can these have? Inference for the slope?

ICPSR 2003 9 Examples Diagnostics for Simple Regression Typical analysis Law school data - Small sample size (n=15) - Let X=LSAT predict Y=GPA Unusual analysis Voting in Florida - Moderate sample size (n=67) - Large outlier Exploring a scatterplot Animated sensitivity Add OLS line to a scatterplot, then change the mouse mode to allow you to interactively drag points and watch the line shift. Leave-one-out diagnostics Fit the regression using the regression command to learn more about this important collection of regression diagnostics. - Leverage (potential effect) - Influence (changes if removed) - Standardized residuals. Linked diagnostic plots. Fox Regression Diagnostics. Sage green mono.

ICPSR 2003 10 Smoothing Further example of smoothing Skeletal age as measure of physical maturity. 10 12 14 16 AGE Is the bend real? This plot shows a loess smooth of the data. Diagnostic procedure Smooth curve based on local robust averaging should track the fitted model. Use smoothing to detect curvature in residuals. Bootstrapping a smoother Visual inspection of fitted curves Resample observations. Want to know more? Modern regression course.

ICPSR 2003 11 Two approaches Resampling in Regression Generalize approaches to two-sample test A two-sample test is a simple regression with a categorical (dummy variable) predictor. Random X (observation resampling) Resample observations as with correlation example or in one approach to the t-test. Fixed X (experimental, residual resampling) Resample residuals as follows - Fit a model and compute residuals - Generate BS data by Y* = (Fit) + (BS sample of OLS residuals) Comparison Resample Observations Residuals Model-dependent No Yes Fixed design X No Yes Maintains (X,Y) assoc. Yes No Differences are most apparent when something is peculiar about the regression model or data, e.g. a severe outlier.

ICPSR 2003 12 Observation vs. Residual Resampling Florida 2000 US Presidential election results Data show by county number registered to Reform Party. number of votes received by Buchanan. Slope estimate b = 3.7 SE(b) = 0.41 (t ª 9) Palm Beach is not so leveraged, but is influential

ICPSR 2003 13 Observation resampling Sample counties as observations. 2 4 6 8 COEF-REG_REFORM_B Replicates reminds of collinearity. The slope and intercept are negatively correlated in a regression when X-bar>0. SE* = 1.15... much larger than OLS claims Residual resampling Sample residuals of fitted model. SE* = 0.37... about same as OLS claims. Why different SE estimates? Random > Fixed SE Is X fixed or is X not fixed? Fixed usually gives a smaller estimate, Var(b X) Var(b)

ICPSR 2003 14 Resampling with Influential Values Comparison of resampling methods Observation resampling Keeps Palm Beach residual at a leveraged location, leading to bimodal distribution. Density of COEF-REG_REFORM_B 2 3.36 4.73 6.1 7.47 Residual resampling Smears the Palm Beach residual around, giving a normal BS distribution. Density of COEF-REG_REFORM_B 2.24 3.11 3.97 4.84 5.7 Extremely different impression of the accuracy of the fitted model. Which is right?

ICPSR 2003 15 Observation resampling Which Method is Right? + Does not assume so much of fitted model Example with unequal variance. Example with nonlinearity. ± Estimates unconditional variation of the slope rather than the conditional variation. ± Does not always agree with classical SE Not appropriate in Anova designs, patterned X s such as time trends (at least not without special care!) Slower to compute (less important now) What would happen for another sample? Would Palm Beach again be an outlier? Would it again have a positive residual? Seems that we might expect Palm Beach to be an outlier, and the direction of the residual also seems plausible.

ICPSR 2003 16 Asymptotics (i.e., really big samples) Asymtotic results Describe what happens as the sample size gets larger and larger. As the sample size grows (with other conditions), Random resampling and fixed X resampling methods become similar, assuming the model is correctly identified. Relation to classical Bootstrap SE* ª usual OLS formula for residual resampling as number of BS replications B >

ICPSR 2003 17 Robust Regression Automatically adjusts for outliers Comparison to OLS OLS fit Variable Slope Std Err t-ratio p-value Constant 1.5325 46.61 0.033 0.97 REG_REFORM 3.6867 0.41 9.019 0.00 R Squared 0.56 Sigma hat 301.9 Robust fit Robust Estimates (HUBER, c=1.345): Variable Slope Std Err t-ratio p-value Constant 45.52 34.9 1.302 0.20 REG_REFORM 2.44 0.3 7.948 0.00 R Squared 0.86 Sigma hat 82.53

ICPSR 2003 18 Size of outlier OLS Robust 0 100 200 300 400 500 REG_REFORM 0 100 200 300 400 500 REG_REFORM Outlier is larger, and more apparent relative to the scale of the fitted model OLS Residual SD ª 300 Robust Residual SD ª 80 OLS fit without Palm Beach

ICPSR 2003 19 OLS without Palm Beach Very similar to robust regression. Plot looks very different with Palm Beach removed from the data set. OLS regression (n=66) 0 100 200 300 400 500 reg_reform Least Squares Estimates for BUCHANAN : Variable Slope Std Err t-ratio p-value Constant 50.28 12.98 3.873 0.00 REG_REFORM 2.44 0.12 20.180 0.00 R Squared: 0.86 Sigma hat: 83.3

ICPSR 2003 20 Reasons to Bootstrap in Regression Confidence intervals and SE s Unless you are doing something special (or the data are unusual), the bootstrap typically gives you very similar SEs and confidence intervals. So why bootstrap? You learn more about regression. Looking at the BS distributions helps you understand what s going on in the regression. You can use methods other than least squares, methods that are less affected by outliers. You can ask some more interesting questions. SE is seldom all that we have interest in. Inference for a robust regression Simple questions can be hard to answer: - Which X s to put into the equation? - Where is the maximum of this fitted curve?

ICPSR 2003 21 Things to Take Away Bootstrap resampling in regression Can be done in two ways, depending on the problem at hand - residual resampling (fixed) - observation resampling (random) Properties of the bootstrap are related to leaveone-out diagnostics (leverage, influence) NEXT TIME... Special applications in regression. Resampling in multiple regression. Other issues in multiple regression - missing data (just a little to say) - measurement error (a little more)

ICPSR 2003 22 Review Questions What assumption is hardest to check, yet perhaps most important in regression? The assumption is that the observations are independent of one another. Unless you have time series data, there are few graphical ways to spot the problem. You ve got to know from the substance of the problem. Do leverage and influence mean the same thing? No, but they are related. An observation that is unusual in X space is leveraged. In simple regression, leveraged observations are at the extreme left and right edges of the plot. In contrast, influence refers to how the regression fit changes when an observation is removed from the fit. Heuristically, Influence Leverage (Stud. Residual) That is, to be influential requires leverage and a substantial residual.

ICPSR 2003 23 How should you use the various regression diagnostic plots? Residuals on fitted: lack of constant variance StudRes. on leverage: source of influence Residual density: normality What would happen if we sampled X and Y separately when bootsrapping the correlation? The true correlation in the BS samples would be zero. Since we would be independently associating values of X with values of Y, the resulting correlation would be zero; X and Y would by construction be independent. How does residual resampling (fixed X) differ from observation resampling (random X)? Residual resampling requires a true model in order to obtain the residuals which are resampled. Observation (or random) resampling does not. Residual resampling keeps the same X s in every bootstrap sample. Which is larger? Random resampling usually leads to a larger estimate of standard error (with enough bootstrap

ICPSR 2003 24 replications) since it allows for more sources of variation (from randomness in X s) How does the bootstrap indicate bias? The average of the BS replicates will differ from the observed value in the sample. For example, suppose the average of the bootstrap replicates is less than the original statistic. Since the original statistic plays the role of the population value, this implies that the original statistic is itself less than the real population value and is thus biased.