Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian

Similar documents
Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

More About Regression

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

DV: Liking Cartoon Comedy

Predicting the Importance of Current Papers

Relationships Between Quantitative Variables

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

Problem Points Score USE YOUR TIME WISELY USE CLOSEST DF AVAILABLE IN TABLE SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

I. Model. Q29a. I love the options at my fingertips today, watching videos on my phone, texting, and streaming films. Main Effect X1: Gender

8 Nonparametric test. Question 1: Are (expected) value of x and y the same?

Normalization Methods for Two-Color Microarray Data

SECTION I. THE MODEL. Discriminant Analysis Presentation~ REVISION Marcy Saxton and Jenn Stoneking DF1 DF2 DF3

GLM Example: One-Way Analysis of Covariance

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Resampling Statistics. Conventional Statistics. Resampling Statistics

Frequencies. Chapter 2. Descriptive statistics and charts

MANOVA COM 631/731 Spring 2017 M. DANIELS. From Jeffres & Neuendorf (2015) Film and TV Usage National Survey

Modeling memory for melodies

Algebra I Module 2 Lessons 1 19

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

ECONOMICS 351* -- INTRODUCTORY ECONOMETRICS. Queen's University Department of Economics. ECONOMICS 351* -- Winter Term 2005 INTRODUCTORY ECONOMETRICS

Modeling television viewership

Linear mixed models and when implied assumptions not appropriate

Moving on from MSTAT. March The University of Reading Statistical Services Centre Biometrics Advisory and Support Service to DFID

WEB APPENDIX. Managing Innovation Sequences Over Iterated Offerings: Developing and Testing a Relative Innovation, Comfort, and Stimulation

Discriminant Analysis. DFs

Visual Encoding Design

1. Model. Discriminant Analysis COM 631. Spring Devin Kelly. Dataset: Film and TV Usage National Survey 2015 (Jeffres & Neuendorf) Q23a. Q23b.

Best Pat-Tricks on Model Diagnostics What are they? Why use them? What good do they do?

MANOVA/MANCOVA Paul and Kaila

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Subject-specific observed profiles of change from baseline vs week trt=10000u

Exercises. ASReml Tutorial: B4 Bivariate Analysis p. 55

in the Howard County Public School System and Rocketship Education

CS229 Project Report Polyphonic Piano Transcription

Tutorial 0: Uncertainty in Power and Sample Size Estimation. Acknowledgements:

Analysis of Seabright study on demand for Sky s pay TV services. Annex 7 to pay TV phase three document

Chapter 3. Averages and Variation

For these items, -1=opposed to my values, 0= neutral and 7=of supreme importance.

Mixed Effects Models Yan Wang, Bristol-Myers Squibb, Wallingford, CT

Latin Square Design. Design of Experiments - Montgomery Section 4-2

The Great Beauty: Public Subsidies in the Italian Movie Industry

Factors Affecting the Financial Success of Motion Pictures: What is the Role of Star Power?

(Week 13) A05. Data Analysis Methods for CRM. Electronic Commerce Marketing

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Western Statistics Teachers Conference 2000

TWO-FACTOR ANOVA Kim Neuendorf 4/9/18 COM 631/731 I. MODEL

Appendices to Chapter 4. Appendix 4A: Variables used in the Analysis

Replicated Latin Square and Crossover Designs

What is Statistics? 13.1 What is Statistics? Statistics

POL 572 Multivariate Political Analysis

Release Year Prediction for Songs

Statistical Consulting Topics. RCBD with a covariate

TI-Inspire manual 1. Real old version. This version works well but is not as convenient entering letter

RANDOMIZED COMPLETE BLOCK DESIGN (RCBD) Probably the most used and useful of the experimental designs.

SIDRA INTERSECTION 8.0 UPDATE HISTORY

Timing and Social Change: An Introduction to and Short Course on Event History Analysis

Mixed Models Lecture Notes By Dr. Hanford page 151 More Statistics& SAS Tutorial at Type 3 Tests of Fixed Effects

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

THE FAIR MARKET VALUE

E X P E R I M E N T 1

Special Article. Prior Publication Productivity, Grant Percentile Ranking, and Topic-Normalized Citation Impact of NHLBI Cardiovascular R01 Grants

K ABC Mplus CFA Model. Syntax file (kabc-mplus.inp) Data file (kabc-mplus.dat)

Libraries as Repositories of Popular Culture: Is Popular Culture Still Forgotten?

Paired plot designs experience and recommendations for in field product evaluation at Syngenta

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

System Identification

1. MORTALITY AT ADVANCED AGES IN SPAIN MARIA DELS ÀNGELS FELIPE CHECA 1 COL LEGI D ACTUARIS DE CATALUNYA

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Analysis of Film Revenues: Saturated and Limited Films Megan Gold

RCBD with Sampling Pooling Experimental and Sampling Error

Chapter 6. Normal Distributions

MID-TERM EXAMINATION IN DATA MODELS AND DECISION MAKING 22:960:575

Electrospray-MS Charge Deconvolutions without Compromise an Enhanced Data Reconstruction Algorithm utilising Variable Peak Modelling

CONCLUSION The annual increase for optical scanner cost may be due partly to inflation and partly to special demands by the State.

Modeling sound quality from psychoacoustic measures

Variation in fibre diameter profile characteristics between wool staples in Merino sheep

Music Genre Classification and Variance Comparison on Number of Genres

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

hprints , version 1-1 Oct 2008

EDDY CURRENT IMAGE PROCESSING FOR CRACK SIZE CHARACTERIZATION

Eigenfactor : Does the Principle of Repeated Improvement Result in Better Journal. Impact Estimates than Raw Citation Counts?

Patrick Neff. October 2017

Identifying the Importance of Types of Music Information among Music Students

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

STAT 503 Case Study: Supervised classification of music clips

BIBLIOGRAPHIC DATA: A DIFFERENT ANALYSIS PERSPECTIVE. Francesca De Battisti *, Silvia Salini

Experiment: Real Forces acting on a Falling Body

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Temporal coordination in string quartet performance

Why visualize data? Advanced GDA and Software: Multivariate approaches, Interactive Graphics, Mondrian, iplots and R. German Bundestagswahl 2005

Reproducibility Assessment of Independent Component Analysis of Expression Ratios from DNA microarrays.

(Skip to step 11 if you are already familiar with connecting to the Tribot)

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Measuring Variability for Skewed Distributions

Transcription:

OLS Regression Assumptions Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian A1. All independent variables are quantitative or dichotomous, and the dependent variable is quantitative, continuous, and unbounded. All variables are measured without error. A2. All independent variables have some variation in value (non-zero variance). A3. There is no exact linear relationship between two or more independent variables (no perfect multicollinearity). A4. At each set of values of the independent variables, the mean of the error term is zero. A5. Each independent variable is uncorrelated with the error term. A6. At each set of values of the independent variables, the variance of the error term is the same (homoscedasticity). A7. For any two observations, their error terms are not correlated (lack of autocorrelation). A8. At each set of values of the independent variables, error term is normally distributed. A9. The change in the expected value of the dependent variable associated with a unit increase in an independent variable is the same regardless of the specific values of other independent variables (additivity assumption). A10. The change in the expected value of the dependent variable associated with a unit increase in an independent variable is the same regardless of the specific values of this independent variable (linearity assumption). A1-A7: Gauss-Markov assumptions: If these assumptions hold, the resulting regression estimates are BLUE (Best Linear Unbiased Estimates). Unbiased: if we were to calculate that estimate over many samples, the mean of these estimates would be equal to the mean of the population (i.e., on average we are on target). Best (also known as efficient): the standard deviation of the estimate is the smallest possible (i.e., not only are we on target on average, but we don t deviate too far from it). If A8-A10 also hold, the results can be used appropriately for statistical inference (i.e., significance tests, confidence intervals). OLS Regression diagnostics and remedies 1. Multivariate Normality OLS is not very sensitive to non-normally distributed errors but the efficiency of estimators decreases as the distribution substantially deviates from normal (especially if there are heavy tails). Further, heavily skewed distributions are problematic as they question the validity of the mean as a measure for central tendency and OLS relies on means. Therefore, we usually test for nonnormality of residuals distribution and if it's found, attempt to use transformations to remedy the problem. To test normality of error terms distribution, first, we generate a variable containing residuals:. reg agekdbrn educ born sex mapres80 age 1

-.05 0 Residuals.05.1-20 -10 Residuals 0 10 20 Source SS df MS Number of obs = 1089 -------------+------------------------------ F( 5, 1083) = 49.10 Model 5760.17098 5 1152.0342 Prob > F = 0.0000 Residual 25412.492 1083 23.4649049 R-squared = 0.1848 -------------+------------------------------ Adj R-squared = 0.1810 Total 31172.663 1088 28.6513447 Root MSE = 4.8441 agekdbrn Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- educ.6158833.0561099 10.98 0.000.5057869.7259797 born 1.679078.5757599 2.92 0.004.5493468 2.808809 sex -2.217823.3043625-7.29 0.000-2.81503-1.620616 mapres80.0331945.0118728 2.80 0.005.0098982.0564909 age.0582643.0099202 5.87 0.000.0387993.0777293 _cons 13.27142 1.252294 10.60 0.000 10.81422 15.72861. predict resid1, resid (1676 missing values generated) Next, we can use any of the tools we used above to evaluate the normality of distribution for this variable. For example, we can construct the qnorm plot:. qnorm resid1-20 -10 0 10 20 Inverse Normal In this case, residuals deviate from normal quite substantially. We could check whether transforming the dependent variable using the transformation we identified above would help us:. quietly reg agekdbrnrr educ born sex mapres80 age. predict resid2, resid (1676 missing values generated). qnorm resid2 -.05 0.05 Inverse Normal 2

10 20 30 40 50 r's age when 1st child born 10 20 30 40 50 10 20 30 40 50 r's age when 1st child born 10 20 30 40 50 r's age when 1st child born 10 20 30 40 50 Looks much better the residuals are essentially normally distributed although it looks like there are a few outliers in the tails. We could further examine the outliers and influential observations; we ll discuss that later. 2. Linearity We looked at bivariate plots to assess linearity during the screening phase, but bivariate plots do not tell the whole story - we are interested in partial relationships, controlling for all other regressors. We can try plots for such relationship using mrunning command. Let s download that first:. search mrunning Keyword search Keywords: mrunning Search: (1) Official help files, FAQs, Examples, SJs, and STBs Search of official help files, FAQs, Examples, SJs, and STBs SJ-5-3 gr0017............. A multivariable scatterplot smoother (help mrunning, running if installed).... P. Royston and N. J. Cox Q3/05 SJ 5(3):405--412 presents an extension to running for use in a multivariable context Click on gr0017 to install the program. Now we can use it:. mrunning agekdbrn educ born sex mapres80 age 1089 observations, R-sq = 0.2768 0 5 10 15 20 highest year of school completed 1 1.2 1.4 1.6 1.8 2 was r born in this country 1 1.2 1.4 1.6 1.8 2 respondents sex 20 40 60 80 100 mothers occupational prestige score (1980) 20 40 60 80 100 age of respondent We can clearly see some substantial nonlinearity for educ and age; mapres80 doesn t look quite linear either. We can also run our regression model and examine the residuals. One way to do so would be to plot residuals against each continuous independent variable:.lowess resid1 age 3

-10 0 10 20 30-10 Residuals 0 10 20 Lowess smoother 20 40 60 80 100 age of respondent bandwidth =.8 We can detect some nonlinearity in this graph. A more effective tool for detecting nonlinearity in such multivariate context is so-called augmented component plus residual plots, usually with lowess curve:. acprplot age, lowess mcolor(yellow) 20 40 60 80 100 age of respondent In addition to these graphical tools, there are also a few tests we can run. One way to diagnose nonlinearities is so-called omitted variables test. It searches for a pattern in residuals that could suggest that a power transformation of one of the variables in the model is omitted. To find such factors, it uses either the powers of the fitted values (which means, in essence, powers of the linear combination of all regressors) or the powers of individual regressors in predicting Y. If it finds a significant relationship, this suggests that we probably overlooked some nonlinear relationship.. ovtest Ramsey RESET test using powers of the fitted values of agekdbrn Ho: model has no omitted variables F(3, 1080) = 2.74 Prob > F = 0.0423 4

. ovtest, rhs (note: born dropped due to collinearity) (note: sex dropped due to collinearity) (note: born^3 dropped due to collinearity) (note: born^4 dropped due to collinearity) (note: sex^3 dropped due to collinearity) (note: sex^4 dropped due to collinearity) Ramsey RESET test using powers of the independent variables Ho: model has no omitted variables F(11, 1074) = 14.84 Prob > F = 0.0000 Looks like we might be missing some nonlinear relationships. We will, however, also explicitly check for linearity for each independent variable. We can do so using Box-Tidwell test. First, we need to download it:. net search boxtid (contacting http://www.stata.com) 3 packages found (Stata Journal and STB listed first) ----------------------------------------------------- sg112_1 from http://www.stata.com/stb/stb50 STB-50 sg112_1. Nonlin. reg. models with power or exp. func. of covar. / STB insert by / Patrick Royston, Imperial College School of Medicine, UK; / Gareth Ambler, Imperial College School of Medicine, UK. / Support: proyston@rpms.ac.uk and gambler@rpms.ac.uk / After installation, see We select this first one, sg112_1, and install it. Now use it:. boxtid reg agekdbrn educ born sex mapres80 age Iteration 0: Deviance = 6483.522 Iteration 1: Deviance = 6470.107 (change = -13.41466) Iteration 2: Deviance = 6469.55 (change = -.5577601) Iteration 3: Deviance = 6468.783 (change = -.7663782) Iteration 4: Deviance = 6468.6 (change = -.1832873) Iteration 5: Deviance = 6468.496 (change = -.103788) Iteration 6: Deviance = 6468.456 (change = -.0399491) Iteration 7: Deviance = 6468.438 (change = -.0177698) Iteration 8: Deviance = 6468.43 (change = -.0082658) Iteration 9: Deviance = 6468.427 (change = -.0035944) Iteration 10: Deviance = 6468.425 (change = -.0018104) Iteration 11: Deviance = 6468.424 (change = -.0008303) -> gen double Ieduc 1 = X^2.6408-2.579607814 if e(sample) -> gen double Ieduc 2 = X^2.6408*ln(X)-.9256893949 if e(sample) (where: X = (educ+1)/10) -> gen double Imapr 1 = X^0.4799-1.931881531 if e(sample) -> gen double Imapr 2 = X^0.4799*ln(X)-2.650956804 if e(sample) (where: X = mapres80/10) -> gen double Iage 1 = X^-3.2902-.0065387933 if e(sample) -> gen double Iage 2 = X^-3.2902*ln(X)-.009996425 if e(sample) (where: X = age/10) -> gen double Iborn 1 = born-1 if e(sample) -> gen double Isex 1 = sex-1 if e(sample) [Total iterations: 33] Box-Tidwell regression model Source SS df MS Number of obs = 1089 -------------+------------------------------ F( 8, 1080) = 38.76 Model 6953.00253 8 869.125317 Prob > F = 0.0000 Residual 24219.6605 1080 22.4256115 R-squared = 0.2230 -------------+------------------------------ Adj R-squared = 0.2173 5

Total 31172.663 1088 28.6513447 Root MSE = 4.7356 agekdbrn Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- Ieduc 1 1.215639.7083273 1.72 0.086 -.174215 2.605492 Ieduc_p1.00374.8606987 0.00 0.997-1.685091 1.692571 Imapr 1 1.153845 9.01628 0.13 0.898-16.53757 18.84525 Imapr_p1.0927861 2.600166 0.04 0.972-5.009163 5.194736 Iage 1-67.26803 42.28364-1.59 0.112-150.2354 15.69937 Iage_p1 -.4932163 53.49507-0.01 0.993-105.4593 104.4728 Iborn 1 1.380925.5659349 2.44 0.015.2704681 2.491381 Isex 1-2.017794.298963-6.75 0.000-2.604408-1.43118 _cons 25.14711.2955639 85.08 0.000 24.56717 25.72706 educ.5613397.05549 10.116 Nonlin. dev. 11.972 (P = 0.001) p1 2.64077.7027411 3.758 mapres80.0337813.0115436 2.926 Nonlin. dev. 0.126 (P = 0.724) p1.4798773 1.28955 0.372 age.0534185.0098828 5.405 Nonlin. dev. 39.646 (P = 0.000) p1-3.290191.8046904-4.089 Deviance: 6468.424. Here, we interpret the last three portions of output, and more specifically the P values there. P=0.001 for educ and P=0.000 for age suggests that there is some nonlinearity with regard to these two variables. Mapres80 appears to be fine. With regard to remedies, the process here is the same as we discussed earlier when talking about bivariate linearity. Once remedies are applied, it is a good idea to retest using these multivariate screening tools. 3. Outliers, Leverage Points, and Influential Observations A single observation that is substantially different from other observations can make a large difference in the results of regression analysis. For this reason, unusual observations (or small groups of unusual observations) should be identified and examined. There are three ways that an observation can be unusual: Outliers: In univariate context, people often refer to observations with extreme values (unusually high or low) as outliers. But in regression models, an outlier is an observation that has unusual value of the dependent variable given its values of the independent variables that is, the relationship between the dependent variable and the independent ones is different for an outlier than for the other data points. Graphically, an outlier is far from the pattern defined by other data points. Typically, in a regression model, an outlier has a large residual. Leverage points: An observation with an extreme value (either very high or very low) on a single predictor variable or on a combination of predictors is called a point with high leverage. Leverage is a measure of how far a value of an independent variable deviates from the mean of that variable. In the multivariate context, leverage is a measure of each observation s distance from the multidimensional centroid in the space formed by all the predictors. These leverage points can have an effect on the estimates of regression coefficients. Influential Observations: A combination of the previous two characteristics produces influential observations. An observation is considered influential if removing the observation substantially 6

changes the estimates of coefficients. Observations that have just one of these two characteristics (either an outlier or a high leverage point but not both) do not tend to be influential. Thus, we want to identify outliers and leverage points, and especially those observations that are both, to assess and possibly minimize their impact on our regression model. Furthermore, outliers, even when they are not influential in terms of coefficient estimates, can unduly inflate the error variance. Their presence may also signal that our model failed to capture some important factors (i.e., indicate potential model specification problem). In the multivariate context, to identify outliers, we want to find observations with high residuals; and to identify observations with high leverage, we can use the so-called hat-values -- these measure each observation s distance from the multidimensional centroid in the space formed by all the regressors. We can also use various influence statistics that help us identify influential observations by combining information on outlierness and leverage. To obtain these various statistics in Stata, we use predict command. Here are some values we can obtain using predict, with the rule-of-thumb cutoff values for statistics used in outlier diagnostics: Predict option Result Cutoff value (n=sample size, k=parameters) xb xb, fitted values (linear prediction); the default stdp standard error of linear prediction residuals residuals stdr standard error of the residual rstandard standardized residuals (residuals divided by standard error) rstudent studentized (jackknifed) residuals, recommended rstudent > 2 for outlier diagnostics (for each observation, the residual is divided by the standard error obtained from a model that includes a dummy variable for that specific observation) lev (hat) hat values, measures of leverage (diagonal Hat >(2k+2)/n elements of hat matrix) *dfits DFITS, influence statistic based on studentized DFits >2*sqrt(k/n) residuals and hat values *welsch Welsch Distance, a variation on dfits WelschD >3*sqrt(k) cooksd Cook's distance, an influence statistic based CooksD >4/n on dfits and indicating the distance between coefficient vectors when the jth observation is omitted *covratio COVRATIO, a measure of the influence of the jth CovRatio-1 >3k/n observation on the variance-covariance matrix of the estimates *dfbeta(varname) DFBETA, a measure of the influence of the jth observation on each coefficient (the difference between the regression coefficient when the jth observation is included and when it is excluded, divided by the estimated standard error of the coefficient) DFBeta > 2/sqrt(n) *Note: Starred statistics are only available for the estimation sample; unstarred statistics are available both in and out of sample; type predict... if e(sample)... if you want them only for the estimation sample. So we could obtain and individually examine various outlier and leverage statistics, e.g., 7

0 Leverage.01.02.03.04.predict hats, lev.predict resid, resid.predict rstudent, rstudent For instance, we can then find the observations with the highest leverage values:. sum hats if e(sample), det Leverage ------------------------------------------------------------- Percentiles Smallest 1%.00176.0015777 5%.0021025.0016196 10%.0023401.00162 Obs 1089 25%.0030041.0016511 Sum of Wgt. 1089 50%.0041908 Mean.0055096 Largest Std. Dev..004043 75%.006332.0236406 90%.010143.0258473 Variance.0000163 95%.0155289.0302377 Skewness 2.466179 99%.0198167.038942 Kurtosis 11.40481. list id hats if hats>.023 & hats~=. & e(sample) +-----------------+ id hats ----------------- 3. 1934.0302377 10. 112.038942 17. 1230.0236406 2447. 1747.0258473 +-----------------+ But the best way to graphically examine both leverage values and residuals at the same time is the leverage versus the residuals squared plot (L-R plot) (you can replicate it by creating a scatterplot of hat values and residuals squared):.lvr2plot, mlabel(id) 112 1934 1747 1230 1699 447 1711 1175 323 594 1194 2156 527 13 42 1575 354 576 377 1518 99 2604 227 1906 352 435 411 1059 1245 2457 320 1616 2225 1268 2638 2235 2458 2611 2556 39322 443 2557 468 844 506 1233 543 19231549 508 538 15341039 1618 1952 586 2019 459 2436 1370 2336 1911 205 372 1188 1742 709 815 969 571 425 1394 541 430 1452 16 356 1614 442 2366 503 514 1186 509 1530 1066 1172 88 1834 68 2362 1200 751 1471 1811 1049 1420 794 1982 10181196 460 770 2022 1932 2124 11611284 1659 1791 1076 805 2348 1972 850 2175 906 2625 2013 2036 92497 2461 1457 51569 930 4812667 2415 2296 2673 1928 633 622 46 1164 610 224 383 612 1229 1622 877 1347 2424 2125 630 1885 2196 1208 976 1596 2763 1102 244 783 211 1243 1985 1344 2188 1615 1847 2069 797 1136740 823 1349 1999 2516 1475 1152 2718 2281 1673 2752 569 1683 1403695 1395 1909 1497 1954 245 2387 970 2402 2467 1043 2224 2073 2475 747 755 1122 246 168 229 1795 196 194 131 2062 1355 570 1900 1337 1971 935 1483 1428 1159 1181 682 1666 2382 1451 2169 1201 1627 23282491 2727 2490 2115 116 2462 1678 1681 2250 1299 1104 840 839 1276 1431 268 843 550 1562 835 1336 2003 270 913 786 1966 2368 1271 1684 1258 280 2426 258 113 2435 2553 12602275 2070 1356 824 2026 1758 439 1716 1961 228 5152039 729 1884 1448 892 1126 2367 1205 1353 595 1964 1117 2719 916 50 1120 1163 984 2453 2764 1153 2385 1004 2411 2148 1841 415 1354 1165 1751 2316 585 1427 1848 665 909 915491 878 1057 1071 1687 1686 1566 2707 1019 1771 2757 960 836 2157 2508 641608 894 17722000 1382 5652539 1542 16262164 1977 117 936 937 402 928 2480 2377 1101 1409 2027 968 1930 1432 2034 2665 1522 749 618 1672 1193 2600 1063 945 2333 2171 2351 998 2212 1512 891 12401402 2669 2607 1106 1365 978 789 2238 92 2700 2520 634 1558 790 1128 2388 2579 1511 1613 1912 307 2008 2194 659 1552 2512 2155 49 1528 492 1469 2386 2702 1526 1262 114 1515 2213 599 1133 1140 2404 2437 380 1072 1478 2446 2035 1157 2089 2699 2006 2133 1496 91 2220 2689 2187 2335 118 904 991 2241 2276 1523562 306 1846 2409 539 1685 1278 1762 1366 822 2086 148 2653 533 876 2067 2159 2345 2488 2633 567 1887 2054 1025 973 2088 2149 2052 596 2477 21522613 532 2542 2522 2725 2432 896 1218 1547 1328 1078 587 1476 2572 2009 1292 2618 923 2263 367 2580 197 2472 803 2268 358 1068 2025 810 830 640 1969 856 2584 690 785 1113 1408 950 404 583 980 2758 1577 1121 2167 778 1461 2765 2438 2270 710 2729 1556 225 827 972 2610 2451 2112 2341 1983 1948 2684 2637 1094 1974 2233 1426 712 2347 855 2136 2481 1385 281 2605 1266 1720 262 1664 1770 1535 2131 2232 1116 2147 2354791 1086 2236 1725 1430 1367 1668 1212 1764 6 1955 1142 2051 2185 1599 27342660 2720 1192 2603 715 2448 2427 1981 1209 275 1583 1551108 1679 609 929 1034654 742 861 1710 1506 1845 298 2239 1939 1455 198 2395 557 1242 2657 1130 1310 893 1037 2413 49540 796 1445 880 1213 1682 2531 739 2498 2166 975 25991564 400 1298 265 516 2410 711 660 406 9 657 1169 627 2285 556 2750 1482 1832 136 681 195 1662559 407 753 1486 2444 2071 1565 188 1521 2100 399 544 1896 1050 1696 987 1997 2624 816 1576 1667 1044 364 2753 1458 1002110 809 2971663 2190 1154 416133 1223 2007 2322 1077 1796102 2492 1676 1054 1540 248 83 938 208 2262 2032 2722 2680 1407 1546 2536 2350 1563 842 2174 2641 2327 60 784 1274 2726 2274 807 1134 511 2634 1807 738 2397 1536 1080 2491316 2704 173 813 2560 2329 619 1760 1724 1788 104 1425 2505 1100 2540 2755 1468 2015 2321 59182 731 934 1638 1501 2526 2151 939 1956 1103 2043 421 1967 1753 269 2655 1220 1369 2205 2058 2738 588 2563 2661 963 2165204 1996 2568 1828 1145 1694 649 1677 1436 758 2663 2602 1567 2158719 426 1231 311 902 1256 1559 2394 1517 2533 2578 1649 230 704 1374 1494 1588 675 2591 1217 1875 294 1838 1110 1359 1174 1051 1767 2434 3712440 326 510 1752 2357 1176 724 1001 1965 322 1189 2334 1390 1631 1539 2227 1329 2292 522 773 1653 339 2083 369 685 8882040 958 2114 602 967 1895 287 55 282 886 2728 457 1089 2059 1654 1761 1470 2741 1460 1035 1680 2442 1917 845 1250 1585 1715 2650 1749 1358 1946 1246 645 575 1953 126 1877 617 2118 1357 1591 1191 1803 2144 638 1630 1897 6661488 235 1700 366 2338 1879 1804 309 2485 2439 1733 2361 1586 2703 1303 376 2092 2309 2649 274 128 1870 304 776 1182 1892 1082 1215 2449 862 1492 889 2142 1259 940 2200 636 21291872 140 2312337 683 2198 11321020 2551 2710 979 1882 1604 2441 4991829 707 531 632 2414 288 1717 2353 1282 1827 2264 1824 857 2737 1022 2623 2209 2288362 178 498 2197 1693 2561 1386 1810 1389 154 2550 686 1146 373 771 279 1568 2096 1656 1088 1096 788 284 296 914 1400 777 1787 607 1719 2644 806 63 1709 436 1056 524 2140 1774 1527 1485 2061 500 2119 1823 2265 2545 1840 1713 2511 1650 687 714 1976 195743 1798 1099 1793 1005 2502 45596 1779 15002537 7691171 1789 2484 1239 2046 501 1806 286 1449 613 2566 276 2060 669 1306 1819 890 1582 167 676 2320 1304 1491 480 2122065 341 179 1873 1625 2339 994 653 1114 1305 1990 2460 0.005.01.015.02 Normalized residual squared There are many observations with high leverage and residuals; we would be especially concerned about 112, 1934, 2460, 1452 etc. 8

9 Added variable plots (avplots) is another tool we can use to identify outliers and leverage points in this case, we can see them in relationship to the slopes. Note that you can also obtain these plots one by one using avplot command, e.g. avplot educ, mlabel(id).avplots, mlabel(id) 112 1934 1230 594 1711 2156 2235 850 68 1268 2013 88 794 2497 1699 447 2415 751 1457 459 2638 1471 5 1196 2196 2188 1347 815 1834 630 930 2424 1152 2296 2461 2667 352 481 1982 244 805 783 1847 1569 924 211 2125 1018 2069 2516 2462 1403 569 1683 2281 622 2124 224 2175 2475 2348 46 1059 2458 2718 2387 2752 2673 1349 1395 695 1673 460 1475 740 1928 1615 2727 2556 1954 1276 1909 2070 839 323 1900 1102 2073 1497 823 970 2115 877 2446 1884 633 168 2035 1795 114 1985 1451 1596 2453 1356 2133 2467 2402 976 1575 1885 1243 747 1964 1355 442 840 906 1208 2187 1229 835 1284 1200 2362 116 2491 984 2036 50 595 1758 514 246 67 1478 1448 1906 1547 270 245 2382 755 2224 258 1627 2490 1043 2328 42 1164 1260 2368 1305 562 59 1122 2086 550 1966 2584 1299 2166 2233 439 1668 515 1116 729 2039 262 268 1271 2553 2275 904 263 393 2435 824 2270 875 45 570 2729 640 1104 415 1140 2633 404 1212 599 2755 197 1142 2054 1128 1762 2765 491 1681 1106 92 1562 609 1408 2347 2164 533 1113 896 2448 1114 2155 1528 1948 2610 2386 1068 843 1427 113 2333 1258 2026 22 2540 1512 1522 2089 1004 1687 280 56 2758 2637 16 709 1558 1771 2725 2603 654 565 1353 915 2136 1057 980 1841 2426 2542 1599 2212 1515 968 1365 2367 712 2472 876 937 2167 778 2171 2669 367 2580 2607 2660 1542 659 195 2388 1622 2699 1432 2034 1426 1983 117 1240 2131 532 1511 1678 1336 2411 585 2169 913 1686 1583 2250 443 122 610 1344 2003 1684 356 789 1431 2557 2006 2147 2385 1523 596 1679 108 252 1613 435 2088 1955 1367 1205 468 2071 1551 1328 2000 2432 827 2579 1981 1192 298 1078 2613 929 978 1974 797 1025 506 1616 973 275 2409 972 1382 2522 2157 2520 2366 120 653 1549 822 354 1063 2341 380 2185 583 1157 1664 803 836 402 1608 1506 2488 892 406 1186 492 1153 2395 2354 587 2665 1912 2238 1672 1469 1764 894 1066 1685 1402 2700 1772 1626 2112 624 1996 2512 785 40 975 1117 1163 1430 2335 516 2436 634 2684 508 1274 1848 909 749 136 786 998 2377 2536 2653 960 2599 1552 891 2152 711 2149 791 1455 567 2757 681 2225 271 936 2404 1298 400 715 64 265 861 1101 928 1710 2481 2025 1242 2508 2213 2351 495 856 2505 1566 1676 1716 509 1071 1770 2480 950 790 809 148 2451 2498 2165 1409 2067 2539 49 102 1126 1050 2734 2052 2427 2008 307 416 1133 770 1461 2316 1577 6 1077 1556 880 1201 229 830 1486 2763 810 1354 1526 1977 84 2492 1969 240 2232 855 882 1425 665 1218 2560 1278 1961 2702 1094 1366 2605 2611 2194 1496 2062 1130 2236 1086 1896 2027 1292 1663 2657 83 2159 559 1407 1458 2268 2600 1266 923 878 2531 2477 2148 118 945 281 1662 1846 1213 358 991 1019 539 1193 1952 1468 1120 916 1535 1374 198 2719 2322 306 1476 51 2618 2263 618 2276 2009 1887 1072 1165 2764 2604 2345 2707 1999 228 572 816 21 2572 2241 1100 2753 133 1614 91 2437 230 1385 969 1930 893 1720 1990 2704 2722 1559 1154 2413 1136 1760 543 2158 1262 2007 1751 1845 1682 2019 1034 188 1159 557 297 1310 104 739 1501 2205 1217 2015 2563 1965 1054 690 588 1752 159 758 2591 1357 1223 1828 1638 710 576 1517 2624 2051 2680 1037 2321 1231 1567 2444 2239 1939 1956 311 196 1246 2726 2190 938 1080 844 1209 2634 1134 2641 1536 2438 2526 675 1521 556 742 2110 2043 619 1946 1176 586 1576 2410 1191 260 1316 421 955 407 212 1039 2151 1807 738 1121 1649 100 2720 512 2689 1445 796 1588 704 1546 225 731 2220 1530 2058 2118 544 1696 1667 1666 2174 62 44 1103 426 1971 1725 1923 1044 503 2661 939 1256 753 645 1172 660 1591 1369 1724 2032 963 2533 571 591 1259 1747 575 1827 1500 902 2738 2397 1035 99 1694 934 1358 204 2327 248 2457 1145 719 1245 1997 1631 373 24 326 987 235 103 2750 682 383 807 2414 649 1540 2578 627 82 2439 1586 2440 2285 2144 2200 2741 2394 1717 1654 1082 1175 776 2065 1089 1967 1953 269 1796 2274 1220 408 2568 2625 655 842 1753 1482 657 2100 1897 1700 1875 2227 724 1604 1715 2460 249 2663 208 1677 1803 261 1233 173 531 1618 511 1788 227 377 2602 194 1215 617 2350 632 60 1436 364 2655 2449 1250 1749 1811 2262 399 1564 1742 813 372 1563 1565 1823 535 784 1892 2329 58 686 707 1132 1877 940 1832 2336 457 2353 2623 1625 1917 602 2197 1171 2442 994 178 683 889 1169 1005 1194 2059 1370 362 2119 1282 1394 425 1972 1001 2357 676 288 2561 2650 366 638 1810 1389 769 538 179 1829 1582 1460 2537 1110 294 1386 341 1873 1693 1882 2441 455 979 845 126 2292 2334 1630 498 1329 2264 1789 2434 2309 154 1022 1585 1304 2129 2649 427 636 1491 499 1793 2198 1188 890 714 1680 1761 2320 2339 1181 167 462 2092 510 2550 1494 1976 1840 1452 2288 376 1650 2511 1798 857 1911 2484 2337 320 274 1146 43 304 2142 143 2710 1182 2265 1824 883 2737 1957 1492 771 1239 231 2209 1872 1518 2502 309 128 1390 2551 1779 1051 501 371 1020 888 1713 480 1879 687 1539 2114 1806 1719 2022 1049 140 666 967 369 1870 279 1470 205 862 2728 2703 1189 322 777 1099 1838 1306 411 1485 1303 607 2485 2046 96 1534 2566 541 2361 1767 430 1733 286 2644 1819 613 1449 276 788 2338 1088 2545 1659 1161 296 500 1420 2060 2096 2061 914 1568 1400 669 2040 1804 284 1787 1056 806 339 1096 13 1527 1656 958 1337 2140 436 1774 524 612 1488 63 55 287 2083 685 886 1359 1895 131 527 522 1932 1653 1709 1076 282 1428 1791 1174 935 773 1483-10 0 10 20 30 e( agekdbrn X ) -15-10 -5 0 5 10 e( educ X ) coef =.6158833, se =.05610987, t = 10.98 2591 975 2560 877 1298 50 904 657 2328 2124 1724 1430 2292 595 627 2236 1349 1517 2624 2584 2439 1461 1355 1576 1004 2395 1250 1104 2071 376 1760 976 2174 1577 906 2198 1395 2722 1687 2366 1122 1900 619 2449 415 640 1767 1583 46 791 602 1217 2083 984 2665 260 1140 2633 2719 1001 1117 2438 1455 263 704 2385 1631 1749 2498 630 710 406 739 1428 1515 2144 427 2689 1586 690 2641 1966 1243 1682 1432 2699 1485 1486 1080 516 968 2200 1762 2334 1556 246 2197 1044 1896 339 1559 1976 1672 2680 1512 1328 771 823 1681 1353 268 62 583 2382 2086 2270 1588 1971 400 1713 2338 1522 1303 404 1121 1848 1627 2563 1810 1523 2092 224 2034 816 244 1932 1153 1733 2394 2750 1501 972 1258 230 2446 2185 2467 588 2327 2637 1840 282 2089 1344 261 1365 1662 1106 2741 1260 1171 1686 913 2212 856 1242 131 596 2402 2453 1316 2321 1176 358 2727 559 1212 888 2205 1562 524 1885 92 44 2335 880 2726 1196 769 258 562 116 1154 1407 421 747 1231 1834 2704 1274 2367 830 1271 1491 499 1568 835 1558 2142 480 1625 2115 712 2472 2397 544 2442 1807 1292 1965 1774 1460 2649 2661 1716 1664 2505 1536 2133 407 1685 416 1063 362 2444 2710 1567 1426 1215 714 1969 587 1649 1239 167 2667 609 271 2368 58 2566 1071 179 60 1347 1678 1679 2725 1306 940 889 649 21 2220 2059 1259 1717 1099 498 1492 939 1770 1829 1582 2329 1676 108 632 843 2550 1604 645 501 783 1540 2542 1897 550 2354 2166 2644 810 1408 2096 676 2623 719 198 2440 2537 512 2600 1068 2350 366 617 2414 439 136 211 807 1476 937 1841 286 1653 1354 2140 1103 1957 1803 2480 1374 2409 535 1266 2025 24 373 2239 265 840 2233 298 1213 2347 426 2000 309 168 2035 2285 950 2526 1823 878 1752 225 133 1819 2511 533 1082 1654 909 288 2157 2522 681 790 287 2491 967 686 1427 687 1939 1758 1240 82 1458 813 2209 2536 1276 1977 1526 1113 1793 749 2561 2043 195 1795 2757 262 1005 1386 402 1608 707 1054 252 842 1892 2477 322 1389 2281 9 2758 56 1668 1547 2046 613 896 1667 994 1483 2520 827 1615 114 148 1663 2765 341 1521 1613 436 1566 2058 875 2288 1964 515 1425 1684 2167 930 776 55 857 809 1436 1072 1542 1996 685 886 1677 1956 311 532 2131 2268 2320 2755 955 173 2669 2065 84 599 970 306 1709 1873 742 892 1223 1591 1470 1157 45 212 1879 5 1787 2061 1967 934 998 2188 2553 1693 128 2026 945 326 1136 557 1791 1511 1599 1771 197 1827 1552 618 1449 2232 240 1114 2448 778 1329 178 1946 2707 2540 2275 1882 2657 2110 492 2060 2516 364 1357 893 1246 2413 2603 850 1152 1500 1402 2700 1563 1182 855 2155 2264 1089 2492 1779 1788 806 1256 2165 2655 1494 1828 675 565 1282 235 159 883 2156 2114 654 695 902 1468 1304 1116 2502 1034 531 2578 2580 367 862 715 2653 1990 876 1035 1409 83 936 669 556 281 297 1700 100 2149 2437 1528 2718 2602 462 2462 2159 1451 2067 1527 1056 2441 1382 569 1638 2703 923 2738 724 2227 1875 1832 2572 2599 204 1358 284 1948 607 1789 1130 2613 2164 890 2539 2551 2388 2158 1895 2353 2461 2151 67 1310 2339 2112 777 1824 935 269 2070 575 665 2387 653 929 2484 2194 1798 1110 2737 2545 987 1715 2187 914 1431 789 2265 622 276 248 1218 2386 2610 2650 64 978 683 49 2415 1305 1174 2345 1506 2729 1020 758 6 1142 2512 2262 773 2241 96 2309 154 960 882 2434 500 1596 1132 2451 2190 1191 1806 103 1025 2015 711 1796 1022 2424 1917 2171 1051 1725 1694 1126 1870 1845 938 104 1659 2125 2728 1585 2634 2263 2475 102 2196 43 963 2720 40 1208 1278 1101 1974 1983 1230 1683 511 1846 1696 738 1262 1076 2460 2054 408 729 891 2341 1872 753 1912 522 2147 788 1761 1680 1656 634 2684 1448 2175 1369 2039 2118 973 594 1385 2568 2607 928 2152 1400 638 1565 861 1134 1043 1359 2088 824 249 1478 1100 1128 636 1077 1884 2660 839 2027 2129 228 2276 2734 2052 455 2238 120 1551 274 1161 2119 457 1145 510 1953 117 2427 1050 1877 1955 1488 682 1751 1719 2663 2007 296 1650 915 1673 399 275 113 2508 2224 1356 1847 2008 979 1367 2490 2432 208 491 1535 1096 2435 2213 2497 1710 63 845 307 2337 1997 1146 2032 1630 2013 2250 2702 1961 2351 1469 822 585 1088 591 140 2579 2022 2322 2485 1753 655 1094 2333 1804 279 1209 2051 1930 2752 2009 1086 894 1764 1838 991 1192 1981 2753 1057 2136 1337 2625 495 143 194 1189 2377 659 1366 731 803 836 59 1403 1220 2274 1475 126 2036 231 304 2605 784 1887 2073 1772 280 1019 1037 1497 1049 2533 1457 958 980 1390 2069 567 2481 1626 380 188 2040 1078 2100 2361 2296 1205 770 572 539 2411 51 1954 1284 1299 2404 1720 1336 624 666 2618 369 1169 2316 1546 1539 371 245 270 2426 1200 481 740 2235 785 612 1193 2488 2531 1133 796 1445 786 633 118 2148 660 1909 2003 2357 1482 1165 2410 1120 916 88 2006 755 570 2362 294 1811 1711 1564 1928 2348 794 2673 68 1163 2169 1229 1999 1569 2764 1181 1985 1622 1471 924 751 1159 805 91 1496 1201 797 1102 1420 229 196 1666 1164 1972 1018 2763 2062 122 1982 383 610 459 1268 460 815 2638 1934 1245 1518 1747 22 411 1616 1549 1059 443 2557 323 2436 2457 1911 571 354 2019 1575 969 844 541 1370 1233 508 576 430 2556 42 1394 506 16 709 2611 503 1534 425 205 1066 2458 527 509 393 1172 112 1530 514 442 1039 1952 1742 99 1614 1906 1186 320 372 1452 1699 2225 538 1188 13 356 352 1194 1618 2604 1175 586 468 543 2336 227 377 447 1923 435-10 0 10 20 e( agekdbrn X ) 0.5 1 e( born X ) coef = 1.6790781, se =.57575994, t = 2.92 710 1117 22 1461 411 619 1577 2750 657 2327 1760 1153 46 2292 2699 1432 2092 2334 2329 2641 2220 830 2498 2338 1911 2335 246 1556 1627 506 524 406 627 131 1353 2350 1512 2034 2124 913 1686 1522 1662 1492 2059 587 810 225 1063 1355 843 559 2096 544 2212 205 1354 2089 807 906 976 244 287 880 2726 714 527 393 1807 2649 2025 2550 1588 116 1426 550 1536 842 1072 1526 1977 1136 2409 2491 835 535 2382 498 909 1521 617 2704 2140 198 512 306 2000 323 2268 2757 1054 1747 1575 749 501 1906 400 1803 1566 889 1791 1494 783 742 84 576 1213 1176 13 888 211 1841 1971 128 1957 1832 1068 827 2703 1667 224 2225 806 436 462 2354 2526 2623 1215 2110 240 2232 1452 2026 1240 1458 2572 998 855 1157 2604 970 1182 1885 930 1389 695 2553 2578 1427 159 685 886 923 55 284 2700 1402 2413 2156 2275 893 1823 532 2131 955 936 1897 1542 2241 2653 1035 2262 1034 51374 2345 2551 2718 1725 2516 366 2194 2440 538 665 262 1892 1511 1824 320 1020 1218 1262 1382 967 987 1870 2387 2264 599 1798 2112 2285 575 1771 265 2263 6 2613 2492 2603 511 2451 929 103 1056 960 1565 876 2125 2512 1506 654 500 978 1656 1789 1694 1917 56 2758 1278 1751 9 2545 1076 377 227 2388 468 510 2276 2336 1872 2765 607 1715 249 112 399 1329 2164 154 2309 891 2424 2265 1974 1022 1191 2152 2663 1838 882 1953 1369 758 1488 729 2490 1585 274 675 208 2171 522 824 2250 1700 556 2118 2147 2634 120 788 279 2227 1875 724 1189 1961 2702 2007 2009 1997 1146 1305 2337 1175 1094 543 2129 2196 784 1110 40 1134 2032 296 2051 2434 1923 1659 1077 1887 822 2752 2497 1475 1366 2274 1220 1535 2119 958 594 2013 2605 915 1088 304 1630 572 457 280 1037 2460 2040 1337 435 2533 1161 1208 2481 567 1086 1230 1497 1057 2136 371 380 2175 740 2426 1165 2003 1299 1133 2235 1564 1482 980 1457 2357 2411 755 786 194 1546 369 294 1928 1159 2764 1420 91 2036 2673 2410 1471 1666 1496 633 1985 1164 805 196 1711 1018 88 2348 1622 2062 1811 1982 2638 1934 1767 2719 2689 1428 2438 376 1001 1616 1932 2174 1250 2665 1121 690 1122 2236 739 904 571 2198 844 771 1233 1430 1576 1733 1243 1724 1682 2557 2019 2591 1349 1517 2624 1344 443 42 2083 1303 1681 1370 2385 1749 1672 704 282 2394 1004 1044 969 358 260 1966 2560 1848 2328 1395 583 1687 823 268 427 1258 1394 1328 62 975 1104 2395 50 2611 1298 2584 856 791 1059 415 1515 2467 2722 2144 1518 60 1840 972 2071 1245 1568 1455 1562 407 816 1140 2633 1896 1292 2436 1523 968 877 2402 1460 1486 1080 2200 425 1485 2142 1678 747 1713 2449 595 263 2556 541 1969 1154 2680 2444 503 1716 1762 44 2710 2644 430 2197 596 2397 2442 1810 1260 2439 1583 1476 1271 649 261 1071 2600 271 1685 1559 2727 1664 2115 1976 1540 2661 1483 1770 322 1316 2563 1266 404 1774 630 813 1365 2667 1099 2637 940 588 1491 499 258 421 878 1653 719 1501 167 769 984 645 2321 16 709 1172 2239 939 1242 339 1217 2480 1106 2086 640 21 602 1171 480 1679 2367 687 2368 1586 2185 1549 1231 230 1534 1631 950 92 1952 2457 2477 1212 2205 1039 840 108 2472 712 632 1436 1558 1684 1939 82 2209 1900 439 790 1103 1615 179 2566 1470 416 1567 1239 2505 2046 362 173 426 516 1677 1005 1604 1347 1582 1829 1625 354 1717 2725 2522 2157 148 133 609 2458 286 1306 562 1563 994 1787 2537 1530 1066 2542 676 2707 402 1608 364 618 1676 707 509 1709 945 2270 937 862 857 1259 2741 2655 1967 2288 2414 2520 1788 2281 1407 58 1223 1408 2061 2437 1819 508 252 1276 1882 1186 1793 515 1618 1613 934 2366 2058 2446 1699 514 288 892 1591 1779 1649 2602 100 2347 281 613 2453 1552 372 2043 2320 1386 2060 1758 557 2561 669 935 2502 1895 533 1282 883 1274 1663 442 2159 341 2511 492 2233 298 1965 1409 1113 715 902 1425 1431 686 773 1196 1188 1256 2067 896 212 1174 2133 2149 2669 1873 297 2065 269 311 1956 1990 809 2539 1449 1358 204 2728 1152 1693 2738 2650 356 1668 2737 914 2755 2657 1742 1996 1946 1310 1599 309 2188 2353 24 373 1468 531 96 753 49 1614 1834 569 1304 136 64 248 1696 1654 1082 1796 1246 2166 1752 653 2190 2540 1357 2720 1527 1114 2448 195 1385 197 83 178 367 2580 2339 1827 1761 1680 565 890 938 2165 1828 681 1126 2155 1846 1547 2015 168 2035 1500 2441 2070 2151 2536 2167 1130 850 776 104 352 228 789 1845 408 1638 1025 1528 1043 276 1101 1879 99 1795 2475 67 2568 963 1116 1400 1948 1964 875 114 2027 586 928 778 1806 2484 634 2684 2158 1912 777 2341 326 2734 2052 2610 2386 1683 2462 2114 2224 1096 45 683 1673 1089 2039 2427 973 2415 1930 1142 1145 845 2238 1847 1884 235 63 2008 738 1194 113 2508 1983 1100 2088 140 447 1650 1132 102 1877 307 1359 43 2213 1551 839 1356 1753 1804 636 991 2351 1955 2607 1448 638 2435 1209 2599 2432 2054 1367 2660 1451 455 1469 591 861 117 1719 126 2187 1019 2579 1128 143 979 1051 894 1764 2461 1050 231 2729 2377 622 2322 585 711 1192 1981 2753 2069 1596 803 836 1390 51 1710 539 491 612 655 1772 2100 2361 275 2618 2485 1403 1336 731 2333 2073 1478 1539 1626 188 1193 2404 659 245 2316 796 1445 495 1078 624 682 1720 2296 2148 118 1205 1120 916 2022 481 1954 785 2488 666 59 660 2625 1181 1999 270 1049 2531 1169 1909 2006 1569 68 1163 924 794 770 1201 751 229 2169 1284 570 1972 1200 122 797 2763 610 1229 1102 383 459 2362 460 1268 815-20 -10 0 10 20 e( agekdbrn X ) -1 -.5 0.5 e( sex X ) coef = -2.2178232, se =.30436248, t = -7.29 1932 1483 322 1838 1136 1832 753 813 2329 2437 42 862 1262 1233 1563 1751 1494 1385 2703 2728 60 2689 2719 1930 612 1952 1725 1470 935 2220 462 1436 2604 1618 773 1344 364 1420 2655 1696 2572 1039 1189 1767 2707 1121 13 1476 2241 2350 1895 1565 1159 510 358 1733 1680 1761 1428 225 1788 1684 2262 618 2602 377 227 2611 1072 2644 228 1174 784 173 844 407 399 1431 1678 100 2600 2345 282 1791 1266 306 878 1677 1394 2477 2438 1292 687 287 2720 425 1787 945 1615 1181 511 281 1370 1460 1001 2263 91 1564 1096 806 1870 1656 2276 2394 1846 1906 572 2268 2159 2009 771 1882 1043 1666 51 1999 2618 1568 279 1887 1969 1445 796 923 958 284 991 140 1019 669 2250 1303 1779 1709 1967 539 649 1165 2444 842 1540 2046 2224 2209 690 1193 1653 1521 2239 2142 2650 82 845 269 571 2764 742 916 1120 830 2710 1939 710 2027 2110 2490 2019 249 1796 856 1020 1258 2502 2750 810 126 2663 1354 208 408 2003 1336 1716 128 1492 2568 1804 1071 96 2061 1673 1977 1526 1770 2194 2069 2148 118 1539 2480 1400 84 950 2551 1681 1076 807 790 1753 240 2232 63 883 2190 855 148 914 2096 122 1223 1282 938 1562 610 2361 719 739 271 2225 2737 2059 665 376 1005 2125 1164 1218 1616 1475 695 2100 1409 1848 1172 2442 2067 1243 2578 2539 2397 969 1278 2327 1953 49 843 1209 2025 1390 2665 1126 503 740 1488 1682 747 1182 1872 2040 857 2060 245 994 1672 1566 1847 1099 6 715 2702 1961 2491 1186 1699 1094 1220 2274 2734 2052 1824 1117 2427 1928 159 2008 231 840 934 64 2026 1366 307 371 2467 2605 987 1840 2402 1685 1054 103 1552 1101 2757 755 204 1358 2335 583 928 1452 393 2451 2149 248 1482 294 143 1154 940 2508 506 2213 2015 113 645 2351 936 2316 1044 557 1664 902 1153 909 447 749 2288 2738 1145 297 2357 2051 274 2156 2718 1569 1310 268 2174 587 972 460 998 1035 1798 1496 2426 823 2336 2661 1018 2387 924 2667 1608 402 1591 1146 1037 939 104 2653 892 1972 1271 62 372 1997 280 2062 524 1201 2353 1694 492 707 304 229 2115 205 1328 634 2684 960 44 1471 2557 1063 2763 2522 2157 2475 1133 2337 1034 1530 2700 1402 2520 2404 891 963 1845 1774 575 527 2152 1256 1650 1917 1103 1122 2752 426 2481 2112 1990 2377 1912 1250 1966 816 2092 2512 133 2556 167 383 500 443 439 2238 1157 2413 591 1749 196 653 2409 1469 21 515 1356 751 2385 2550 567 894 1764 1877 2281 815 2341 1506 1772 535 1523 188 1369 893 1923 1626 596 2673 704 2553 1884 970 459 624 68 2275 131 468 2435 1025 2000 2007 1188 1382 785 2032 1679 824 2638 430 544 108 436 2198 827 2320 803 836 973 2058 2039 481 2533 252 2083 786 632 1163 1613 1260 356 550 660 1667 2235 569 2339 120 1268 2613 2488 929 2368 805 1100 978 822 427 1551 1720 1683 1974 2432 2140 714 2579 729 1276 2338 541 2753 2727 380 1981 1192 261 1491 499 435 617 2088 522 1955 1078 260 1367 1152 890 2070 1793 1686 913 2516 2296 1316 1527 286 1576 1534 512 421 2236 2118 1819 1713 198 2073 1205 789 1985 1403 839 613 2649 531 1982 258 1662 930 1627 1896 480 2334 1359 1807 685 886 797 1468 1449 1497 2545 1803 1299 212 1304 2699 1432 1604 1213 276 2147 1810 731 586 2034 498 559 619 2424 1515 769 296 514 968 709 16 2497 2151 5 179 538 1426 1829 1582 1337 1663 1365 2537 2144 22 2200 2322 636 2367 411 788 1056 1080 738 712 2472 1458 2657 1386 1486 1954 55 2566 1240 955 1806 2264 341 83 2013 937 2043 1873 2641 1517 2624 2006 532 2131 1789 1693 1088 1102 509 1909 676 794 1511 1717 154 2309 1004 1059 1719 1022 1946 2441 1687 246 288 585 501 2669 1585 1349 2680 2561 1461 1546 1535 2212 1395 1536 1577 1983 2542 1239 1485 367 2580 2129 1911 2321 1130 2436 1542 2065 1934 1231 2637 1389 1556 117 1425 2526 1567 1347 1758 1715 416 1353 2414 455 1558 2458 2726 666 979 2725 880 2388 1191 1599 588 211 362 2607 2498 2660 1242 1956 311 369 1630 638 2169 791 67 2505 1512 1841 1455 876 1246 1522 2531 1676 1957 2089 2563 1306 320 783 442 1762 2411 659 565 1430 1066 1448 2171 655 2197 2492 1106 2188 809 92 1104 1357 415 1828 2633 1140 1171 1724 609 2165 2540 1638 1501 882 1408 2634 404 2205 2347 116 1086 533 1134 2395 1771 1113 1212 2603 896 46 1068 654 102 889 2485 1427 270 607 2448 1114 2155 1625 43 904 1050 1528 2265 835 178 2410 2722 1948 1259 2511 2386 2610 1077 2484 2755 1559 1760 758 263 197 683 915 861 352 2185 2119 1622 406 1976 244 1710 1827 1057 2136 1996 1500 1142 2333 495 980 1614 2164 2623 230 2054 1128 599 2354 2415 2071 1742 777 2584 491 543 686 2704 1169 2158 58 1457 850 2449 1132 298 2086 2233 40 1711 1668 262 2196 1116 1823 457 1892 562 1229 2328 570 594 275 630 1583 339 1588 2348 2462 1215 1518 1649 633 2292 1407 1374 2560 967 657 1245 59 1305 1659 1161 1879 2439 50 888 2114 1586 2591 195 595 1176 2382 1329 2457 2741 1897 194 576 1811 309 627 2434 2167 1547 1654 1082 1298 984 778 2022 1175 366 776 1700 1875 2227 724 1631 1049 1089 88 508 1051 1194 2460 2440 675 2285 556 1110 1217 99 373 24 975 1752 1549 602 1575 354 1478 326 1965 2625 400 235 640 516 2758 56 681 1355 906 1274 265 9 682 136 2270 877 2536 976 2453 1964 2133 2765 2124 2599 1885 770 2446 711 875 2187 45 224 2035 168 2729 1795 114 2036 323 1971 1451 1208 1284 1900 1596 1200 2166 1230 2362 622 2175 1196 2461 112 1834 1747 2366-10 0 10 20 e( agekdbrn X ) -20 0 20 40 e( mapres80 X ) coef =.03319454, se =.01187284, t = 2.8 1117 2719 710 2689 2438 1122 1243 46 1349 1461 2665 1577 904 1344 1681 619 1395 1153 2236 690 1767 2174 1121 1001 823 2699 1432 246 739 1966 2750 1430 1627 2385 376 830 1004 1672 268 1682 1258 1250 2335 2124 1760 2328 2467 358 2327 2498 1848 1576 1687 2220 583 50 1517 2624 1104 244 1353 1328 1512 2402 1556 2641 2329 2034 2584 2591 1724 704 406 415 747 657 1562 856 1522 2560 1515 1686 913 2394 1044 2395 260 2198 843 877 771 972 975 2667 1749 1140 2633 1355 791 1678 587 968 1523 1063 2334 22 630 62 810 2092 1292 1662 1298 2156 2071 1932 2212 595 906 2350 1455 2292 976 1616 2089 1428 2727 263 1260 1896 816 1733 1354 2115 1486 407 2722 1271 559 225 1762 1969 116 1716 1303 596 627 2491 60 427 550 2025 1080 1154 42 2144 1476 1071 2444 880 544 835 271 1583 2059 783 2600 2338 1426 1685 258 1664 2680 404 2200 807 1770 1365 2382 211 1460 1266 2726 1492 44 1526 1977 2409 2637 1840 1347 1072 2557 1615 1807 1136 2397 443 840 984 909 2368 844 878 649 2083 2000 1059 1559 2449 2086 306 1106 2757 524 2142 2268 2019 2442 749 2480 2367 2563 1588 571 1536 930 2661 282 1679 1316 261 1540 198 1521 640 842 2439 506 1566 2197 92 439 1242 1684 1233 714 2710 1212 1810 813 224 588 1713 1054 950 2239 2649 1568 712 2472 5 1900 2704 108 421 969 1501 21 2477 84 2550 2185 1558 131 512 719 695 1485 2321 2611 2096 1841 970 790 645 400 535 1068 939 2556 940 2644 2281 742 2026 1213 827 617 411 230 1231 1217 1370 322 1939 498 2205 1976 2232 240 2725 1240 499 1491 416 609 562 2505 2522 2157 2354 167 1803 769 998 1276 1885 148 2572 2718 2553 2516 1157 1436 2542 82 393 2436 1103 287 855 2110 1099 1586 1494 402 1608 1458 1171 1176 1394 516 1567 2707 1631 632 618 1667 937 1832 133 2526 889 1427 2275 687 945 426 602 173 1911 1677 1408 515 2270 501 1676 2520 923 480 2700 1402 2387 252 1971 2437 2131 532 1774 1196 1604 1542 2446 2209 462 1005 936 159 1717 179 2653 1563 1906 1758 2241 1245 128 1613 1223 364 1215 362 2125 2140 503 2623 2413 2345 2453 1152 893 892 1470 2347 1483 425 1407 1625 2703 2578 262 1582 1829 994 323 1957 1967 2194 2655 1575 281 1034 1552 888 665 707 1788 1239 1511 2188 1259 1653 709 16 533 339 955 2046 2537 2414 1374 676 1725 1518 100 1218 1035 1382 2159 2233 1182 599 2566 1699 1389 2741 492 2058 557 1113 1262 2262 934 1431 2366 850 205 1409 1771 1591 2112 806 857 1663 715 1882 58 862 1791 1952 2602 2288 569 6 2043 2133 2613 2458 2263 298 896 1834 1274 1649 1823 1897 1306 2067 2603 286 2669 436 2424 1172 2149 1549 1425 1793 929 112 541 2451 1787 876 1039 960 2539 654 2225 987 288 2492 2512 265 2551 2440 1506 430 1668 575 978 1779 594 1282 297 809 1965 2755 902 1892 1386 1990 2320 1819 2561 1256 212 2061 1599 284 366 1278 576 1824 2388 2502 2604 1751 2758 56 1747 1956 311 883 341 2070 511 1020 49 103 269 2657 64 1996 2285 2264 2457 354 1066 1709 2540 1565 1870 2276 1310 1114 2448 527 204 1358 1798 653 197 2475 1468 2065 2738 886 685 509 2511 613 2765 367 2580 9 686 2166 55 514 2164 1694 1385 565 753 2490 1946 2060 1547 168 2035 195 2155 136 2415 1043 1186 1530 1696 1873 891 669 1126 967 2190 83 1846 729 2720 2650 1974 2497 1917 248 1534 2165 2196 1230 1246 442 1693 2152 1357 824 508 67 938 1796 2353 1789 531 1452 2728 1895 1683 249 789 2015 2250 681 1795 1752 1528 399 2171 2013 510 228 2737 2167 373 24 1025 1828 1673 1847 882 1101 1715 13 1130 2462 1964 1116 935 1654 1082 1827 1304 104 114 1948 2536 1618 1845 2147 1191 2224 1449 2151 1872 120 1680 1761 2663 500 2027 758 1500 1369 178 875 1953 1056 96 1638 1656 309 2339 408 928 2752 2309 154 914 890 1961 2702 2545 1884 1475 208 773 634 2684 1305 2009 1912 2386 2610 2039 1174 2341 778 675 2441 372 1329 2734 2052 963 2568 2118 1022 2634 1094 607 2007 776 45 356 538 1142 556 973 839 113 2427 2265 1356 1838 1585 1930 2158 274 2238 352 40 1700 468 1983 1076 2008 1527 2235 2508 1997 2088 822 320 1887 2051 724 1875 2227 307 1448 1551 2213 1077 1100 1134 1366 784 2435 279 1188 2032 1879 1145 102 1189 326 915 276 2607 1614 2351 1955 738 377 227 991 2605 280 1535 447 2461 1089 1146 2069 683 1451 2337 2054 2484 1488 1806 2432 1400 2660 845 1367 2129 572 788 1469 1742 2274 1220 235 117 1209 2336 1753 1497 1110 777 1877 2187 740 522 622 2114 2599 1037 1128 861 2434 1208 2579 1132 1457 140 1019 1650 1096 1403 1764 894 2481 567 2175 2119 591 296 2377 636 1596 43 2729 1050 585 1086 2136 1057 1981 1192 638 2426 380 1630 457 304 2533 836 803 586 2073 491 2322 455 2460 1659 51 126 1336 2296 2753 539 63 99 1299 2003 1772 711 1710 1804 958 2618 245 543 1088 1165 275 2333 755 1478 481 979 1711 1719 143 1359 1928 1626 1133 231 980 659 1471 2100 1923 1193 2404 2411 1175 2040 188 655 731 1051 2316 1078 786 1954 371 1337 624 1390 68 495 1161 2673 1205 1445 796 118 2148 1564 1482 435 1720 1120 916 1194 1546 2764 785 1159 2361 2357 1539 612 2036 2488 2485 59 1569 805 794 91 194 682 751 294 270 1909 88 1985 1999 924 1164 1018 369 633 660 1496 2410 2531 1666 2006 2022 2348 1181 2625 1163 1420 666 196 1982 1169 1622 1201 2062 1284 2169 1268 459 229 570 770 1049 1102 122 1200 1229 797 610 1934 2763 2638 460 1811 1972 815 2362 383-10 0 10 20 30 e( agekdbrn X ) -40-20 0 20 40 e( age X ) coef =.05826431, se =.00992019, t = 5.87 Observation #2460 is the first one that looks especially suspicious - that's an outlier, a high residual observation; same thing with 1305. Looks like these are people who had their first child very late in life. As for high leverage observations, not too many stand out on this graph, although #112 might be one looks like that might be a foreign born individual with very little education who had their first child relatively late in life. To supplement these graphs, we can use a number of influence statistics that combine information on outlier status and leverage -- DFITS, Welsch's D, Cook's D, COVRATIO, and DFBETAs. It is usually a good idea to obtain a range of those to decide which cases are really problematic. It makes sense to list the values of your dependent and independent variables for those observations that have values of these measures above the suggested cutoffs. E.g., we get Cook's D (based on hat values and standardized residuals):. predict cooksd if e(sample), cooksd Don t forget to specify if e(sample) here Cook s D is available out of sample as well!

NOTE: if you already generated a variable with this name (e.g. cooksd) but want to reuse the name, just use the drop command first: e.g., drop cooksd Now we list those observations with high Cook's distance. The cutoff is 4/n so in this case, it's 4/1089=.00367309.. sort cooksd. list id agekdbrn educ born sex mapres80 age cooksd if cooksd>=4/1089 & cooksd~=. +--------------------------------------------------------------------+ id agekdbrn educ born sex mapres80 age cooksd -------------------------------------------------------------------- 1031. 1394 30 15 no female 28 33.0036766 1032. 63 19 19 yes female 34 64.003683 1033. 2484 37 17 yes female 52 56.0037003 1034. 1906 29 10 no male 23 39.0037224 1035. 994 38 15 yes female 33 41.003788 -------------------------------------------------------------------- 1036. 22 19 12 no male 44 23.0038182 1037. 1402 37 12 yes male 33 42.0038667 1038. 742 36 13 yes male 28 39.0038726 1039. 366 37 17 yes male 66 44.0041899 1040. 2265 39 17 yes male 52 55.004212 -------------------------------------------------------------------- 1041. 2703 16 16 yes male 23 45.004219 1042. 1284 17 12 yes female 64 76.0043403 1043. 2764 35 12 yes male 23 75.0044005 1044. 1114 39 12 yes female 46 46.0044603 1045. 2653 38 12 yes male 32 43.0044713 -------------------------------------------------------------------- 1046. 322 13 16 yes female 20 38.0044766 1047. 352 16 9 no female 44 49.0045471 1048. 1382 39 12 yes male 35 45.0045595 1049. 1990 42 13 yes female 34 46.0046982 1050. 514 16 11 no female 40 42.0047655 -------------------------------------------------------------------- 1051. 1186 30 12 no female 30 44.0049131 1052. 669 37 18 yes female 32 49.005042 1053. 1428 17 20 yes female 32 28.0052439 1054. 753 35 13 yes female 17 51.0053052 1055. 797 34 12 yes female 35 83.0054951 -------------------------------------------------------------------- 1056. 126 38 15 yes female 28 65.0056446 1057. 1824 41 16 yes male 34 49.0058367 1058. 6 40 12 yes male 29 47.0059349 1059. 447 26 6 no female 23 55.0060603 1060. 1549 32 14 no female 66 34.0061423 -------------------------------------------------------------------- 1061. 1066 32 13 no female 47 40.0062896 1062. 612 36 18 yes female 23 73.0063017 1063. 508 18 14 no female 64 40.0064009 1064. 1747 24 17 no male 86 36.0065845 1065. 1189 39 16 yes male 23 62.0066001 -------------------------------------------------------------------- 1066. 773 37 20 yes female 28 54.0070942 1067. 2545 42 18 yes male 46 54.0072636 1068. 1709 38 20 yes female 35 47.0073801 1069. 541 35 18 no female 46 37.0075467 1070. 524 16 19 yes male 42 34.0075767 -------------------------------------------------------------------- 1071. 430 35 18 no female 44 38.0075794 1072. 1194 21 17 no female 66 60.0079331 10