In 2007, Pew Research conducted a survey to assess Americans knowledge of

Similar documents
Analog Signal Input. ! Note: B.1 Analog Connections. Programming for Analog Channels

A Buyers Guide to Laser Projection

Cast Away on the Letter A

MINIMED 640G SYSTEM^ Getting Started. WITH THE MiniMed 640G INSULIN PUMP

With Ease. BETTY WAGNER Associate Trinity College London, Associate Music Australia READING LEDGER LINE NOTES

770pp. THEORIA 64 (2009)

BRAND GUIDELINES 2017

A P D C G Middle C u B

A Model for Scale-Degree Reinterpretation: Melodic Structure, Modulation, and Cadence Choice in the Chorale Harmonizations of J. S.

worth and in young go to work!

A Parallel Multilevel-Huffman Decompression Scheme for IP Cores with Multiple Scan Chains

Music Theory Level 2. Name. Period

A Real-time Framework for Video Time and Pitch Scale Modification

HELMUT T. ZWAHLEN AND UMA DEVI VEL

EXHIBITOR S PROSPECTUS

P D C G Middle C u B

Speech Recognition Combining MFCCs and Image Features

E-Vision Laser 4K Series High Brightness Digital Video Projector

The nature of the social experience at popular music festivals: Bestival a case study. Millie Devereux Caroline Jackson Bournemouth University

c:: Frequency response characteristics for sinusoidal movement in the fovea and periphery* ==> 0.' SOO O.S 2.0

Pipelining. Improve performance by increasing instruction throughput Program execution order. Data access. Instruction. fetch. Data access.

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Review: What is it? What does it do? slti $4, $5, 6

TI-Inspire manual 1. Real old version. This version works well but is not as convenient entering letter

Meals eaten in a work cafeteria. Lasagne Fish and chips Type of meal. Gender and age of employees

Easy Estimation of Spectral Purity of Test Signals for ADC Testing. David Slepička

HIGHlite 4K Series High Brightness Digital Video Projector

Vadim V. Romanuke * (Professor, Polish Naval Academy, Gdynia, Poland)

Product Overview 2009

Quantitative methods

8-1. Advanced Features About TV Watching TV... TV Antenna TV Windows Initial Setup Channel Settings...

STAYING INFORMED ACROSS THE GARDEN STATE WHERE DO YOU GO AND WHAT DO YOU KNOW?

Using Device-Specific Data Acquisition for Automated Laboratory Testing

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Novel Blind Recognition Algorithm of Frame Synchronization Words Based on Soft- Decision in Digital Communication Systems

Chapter 14. From Randomness to Probability. Probability. Probability (cont.) The Law of Large Numbers. Dealing with Random Phenomena

Membership Services Directory. ooks.bc.ca. Association of Book Publishers of British Columbia

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

E-Vision Laser 7500 Series E-Vision Laser 8500 Series E-Vision Laser 10K Series High Brightness Digital Video Projector

LB3-PCx50 Premium Cabinet Loudspeakers

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

Montgomery Modular Exponentiation on Reconfigurable Hardware æ

What is Statistics? 13.1 What is Statistics? Statistics

How Large a Sample? CHAPTER 24. Issues in determining sample size

Eisenberger with mayoral lead in Hamilton Largest number undecided

STOCK MARKET DOWN, NEW MEDIA UP

Experimental Study on Two-Phase Flow Instability in System Including Downcomers

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

MATH& 146 Lesson 11. Section 1.6 Categorical Data

Language at work Present simple

2012 Inspector Survey Analysis Report. November 6, 2012 Presidential General Election

CONTENTS. PART ONE A rough pictorial guide to pipe organs. Foreword Preface Acknowledgements. At first glance 47

Lecture 10: Release the Kraken!

Distribution of Data and the Empirical Rule

by Johann Christian Bach

¾Strip cable to 8 mm (max. 9) ¾Insert cable in the open DuoFix plug-in terminal at 45. LL2 cables per terminal position possible

Margin of Error. p(1 p) n 0.2(0.8) 900. Since about 95% of the data will fall within almost two standard deviations, we will use the formula

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Algebra I Module 2 Lessons 1 19

Objective: Write on the goal/objective sheet and give a before class rating. Determine the types of graphs appropriate for specific data.

VIP X16 XF E Video Encoder

DQMx Series. Digital QAM Multiplexer INSTRUCTION MANUAL. Model Stock No. Description

1. Basic safety information 4 2. Proper use 4

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

GLENCOE LANGU GE ARrs

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Thinking Involving Very Large and Very Small Quantities

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

in the Howard County Public School System and Rocketship Education

THE EVENT ARGUMENT and ARGUMENT INTRODUCERS: little v, and the Applicative Head. λe <s,t> v Appl

How Millennials Get News: Inside the Habits of America s First Digital Generation

SPECTRA RESEARCH Institute

1. Basic safety information 4 2. Proper use 4

Marquette Law School Poll, July 15-18, 2013

Improved Graphic Techniques in Signal Progression

THE EVENT ARGUMENT and ARGUMENT INTRODUCERS: little v, and the Applicative Head. λe <s,t> v Appl

The Fox News Eect:Media Bias and Voting S. DellaVigna and E. Kaplan (2007)

Viewers and Voters: Attitudes to television coverage of the 2005 General Election

FIRST WATER TESTING ANALYSIS DUE APRIL 6

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

Cambridge Assessment International Education Cambridge International General Certificate of Secondary Education

Experimental. E-Gun. E-Gun Modulator Arrangement AI VI MONITORS TRIODE ELECTRON BEA~ CATHODE TRIGGER

LB2 Premium-sound Cabinet Loudspeaker Range

1. Basic safety information. 2. Proper use. 3. Installation and connection. Time switch installation. Disposal. click. Time switch.

9 Guests are allowed to wear casual dress. 11 There's a possibility that the show will be cancelled think that Andrew will collect the money.

SALES DATA REPORT

CUSTOM INSTALLATION. Autoleads Custom Installation section provides a full range of quality custom vehicle installation products.

Pulling the plug: Three-in-ten Canadians are forgoing home TV service in favour of online streaming

LBC 347x/00 Horn and Driver Loudspeaker Range

Orinda Theatre Square 2 Theatre Square, Orinda, CA

MetroLED. Linear LED Lighting System for Display Illumination

GROWING VOICE COMPETITION SPOTLIGHTS URGENCY OF IP TRANSITION By Patrick Brogan, Vice President of Industry Analysis

ENGLISH FILE. 5 Grammar, Vocabulary, and Pronunciation B. 3 Underline the correct word(s). 1 Order the words to make sentences.

China s Overwhelming Contribution to Scientific Publications

Spring 2008 EDWARD GREEN. œ œ # œ

Using Poetry to Change Dialogues on Multiculturalism & Social Activism

Field Communication FXA 675 Rackbus RS-485 Interface monorack II RS-485

Chapter 21. Margin of Error. Intervals. Asymmetric Boxes Interpretation Examples. Chapter 21. Margin of Error

Transcription:

CHAPTER 12 Sample Srveys In 2007, Pew Research condcted a srvey to assess Americans knowledge of crrent events. They asked a random sample of 1,502 U.S. adlts 23 factal qestions abot topics crrently in the news. 1 Pew also asked respondents where they got their news. Those who freqented major newspaper Web sites or who are reglar viewers of the Daily Show or Colbert Report scored best on knowledge of crrent events. 2 Even among those viewers, only 54% responded correctly to 15 or more of the qestions. Pew claimed that this was close to the tre percentage responding correctly that they wold have fond if they had asked all U.S. adlts who got their news from those sorces. That step from a small sample to the entire poplation is impossible withot nderstanding Statistics. To make bsiness decisions, to do science, to choose wise investments, or to nderstand what voters think they ll do the next election, we need to stretch beyond the data at hand to the world at large. To make that stretch, we need three ideas. Yo ll find the first one natral. The second may be more srprising. The third is one of the strange bt tre facts that often confse those who don t know Statistics. Idea 1: Examine a Part of the Whole Activity: Poplations and Samples. Explore the differences between poplations and samples. The first idea is to draw a sample. We d like to know abot an entire poplation of individals, bt examining all of them is sally impractical, if not impossible. So we settle for examining a smaller grop of individals a sample selected from the poplation. Yo do this every day. For example, sppose yo wonder how the vegetable sop yo re cooking for dinner tonight is going to go over with yor friends. To decide whether it meets yor standards, yo only need to try a small amont. Yo might taste jst a spoonfl or two. Yo certainly don t have to consme the whole 268 1 For example, two of the qestions were Who is the vice-president of the United States? and What party controls Congress? 2 The lowest scores came from those whose main sorce of news was network morning shows or Fox News.

Bias 269 The W s and Sampling The poplation we are interested in is sally determined by the Why of or stdy.the sample we draw will be the Who. When and How we draw the sample may depend on what is practical. pot. Yo trst that the taste will represent the flavor of the entire pot. The idea behind yor tasting is that a small sample, if selected properly, can represent the entire poplation. It s hard to go a day withot hearing abot the latest opinion poll. These polls are examples of sample srveys, designed to ask qestions of a small grop of people in the hope of learning something abot the entire poplation. Most likely, yo ve never been selected to be part of one of these national opinion polls. That s tre of most people. So how can the pollsters claim that a sample is representative of the entire poplation? The answer is that professional pollsters work qite hard to ensre that the taste the sample that they take represents the poplation. If not, the sample can give misleading information abot the poplation. Bias In 1936, a yong pollster named George Gallp sed a sbsample of only 3000 of the 2.4 million responses that the Literary Digest received to reprodce the wrong prediction of Landon s victory over Roosevelt. He then sed an entirely different sample of 50,000 and predicted that Roosevelt wold get 56% of the vote to Landon s 44%. His sample was apparently mch more representative of the actal voting poplace. The Gallp Organization went on to become one of the leading polling companies. Video: The Literary Digest Poll and the Election of 1936. Hear the story of one of the most famos polling failres in history. Selecting a sample to represent the poplation fairly is more difficlt than it sonds. Polls or srveys most often fail becase they se a sampling method that tends to over- or nderrepresent parts of the poplation. The method may overlook sbgrops that are harder to find (sch as the homeless or those who se only cell phones) or favor others (sch as Internet sers who like to respond to online srveys). Sampling methods that, by their natre, tend to over- or nderemphasize some characteristics of the poplation are said to be biased. Bias is the bane of sampling the one thing above all to avoid. Conclsions based on samples drawn with biased methods are inherently flawed. There is sally no way to fix bias after the sample is drawn and no way to salvage sefl information from it. Here s a famos example of a really dismal failre. By the beginning of the 20th centry, it was common for newspapers to ask readers to retrn straw ballots on a variety of topics. (Today s Internet srveys are the same idea, gone electronic.) The earliest known example of sch a straw vote in the United States dates back to 1824. Dring the period from 1916 to 1936, the magazine Literary Digest reglarly srveyed pblic opinion and forecast election reslts correctly. Dring the 1936 presidential campaign between Alf Landon and Franklin Delano Roosevelt, it mailed more than 10 million ballots and got back an astonishing 2.4 million. (Polls were still a relatively novel idea, and many people thoght it was important to send back their opinions.) The reslts were clear: Alf Landon wold be the next president by a landslide, 57% to 43%. Yo remember President Landon? No? In fact, Landon carried only two states. Roosevelt won, 62% to 37%, and, perhaps coincidentally, the Digest went bankrpt soon afterward. What went wrong? One problem was that the Digest s sample wasn t representative. Where wold yo find 10 million names and addresses to sample? The Digest sed the phone book, as many srveys do. 3 Bt in 1936, at the height of the Great Depression, telephones were a real lxry, so they sampled more rich than poor voters. The campaign of 1936 focsed on the economy, and those who were less well off were more likely to vote for the Democrat. So the Digest s sample was hopelessly biased. How do modern polls get their samples to represent the entire poplation? Yo might think that they d handpick individals to sample with care and precision. 3 Today phone nmbers are compter-generated to make sre that nlisted nmbers are inclded. Bt even now, cell phones and VOIP Internet phones are often not inclded.

270 CHAPTER 12 Sample Srveys Bt in fact, they do something qite different: They select individals to sample at random. The importance of deliberately sing randomness is one of the great insights of Statistics. Idea 2: Randomize Think back to the sop sample. Sppose yo add some salt to the pot. If yo sample it from the top before stirring, yo ll get the misleading idea that the whole pot is salty. If yo sample from the bottom, yo ll get an eqally misleading idea that the whole pot is bland. By stirring, yo randomize the amont of salt throghot the pot, making each taste more typical of the whole pot. Not only does randomization protect yo against factors that yo know are in the data, it can also help protect against factors that yo didn t even know were there. Sppose, while yo weren t looking, a friend added a handfl of peas to the sop. If they re down at the bottom of the pot, and yo don t randomize the sop by stirring, yor test spoonfl won t have any peas. By stirring in the salt, yo also randomize the peas throghot the pot, making yor sample taste more typical of the overall pot even thogh yo didn t know the peas were there. So randomizing protects s even in this case. How do we stir people in a srvey? We select them at random. Randomizing protects s from the inflences of all the featres of or poplation by making sre that, on average, the sample looks like the rest of the poplation. Activity: Sampling from Some Real Poplations. Draw random samples to see how closely they resemble each other and the poplation. Why not match the sample to the poplation? Rather than randomizing, we cold try to design or sample so that the people we choose are typical in terms of every characteristic we can think of. We might want the income levels of those we sample to match the poplation. How abot age? Political affiliation? Marital stats? Having children? Living in the sbrbs? We can t possibly think of all the things that might be important. Even if we cold, we woldn t be able to match or sample to the poplation for all these characteristics. FOR EXAMPLE Here are smmary statistics comparing two samples of 8000 drawn at random from a company s database of 3.5 million cstomers: Is a random sample representative? Age (yr) White (%) Female (%) # of Children Income Bracket (1 7) Wealth Bracket (1 9) Homeowner? (% Yes) 61.4 85.12 56.2 1.54 3.91 5.29 71.36 61.2 84.44 56.4 1.51 3.88 5.33 72.30 Qestion: Do yo think these samples are representative of the poplation? Explain. The two samples look very similar with respect to these seven variables. It appears that randomizing has atomatically matched them pretty closely. We can reasonably assme that since the two samples don t differ too mch from each other, they don t differ mch from the rest of the poplation either. Idea 3: It s the Sample Size How large a random sample do we need for the sample to be reasonably representative of the poplation? Most people think that we need a large percentage, or fraction, of the poplation, bt it trns ot that what matters is the

Does a Censs Make Sense? 271 Activity: Does the Poplation Size Matter? Here s the narrated version of this important idea abot sampling. nmber of individals in the sample, not the size of the poplation. A random sample of 100 stdents in a college represents the stdent body jst abot as well as a random sample of 100 voters represents the entire electorate of the United States. This is the third idea and probably the most srprising one in designing srveys. How can it be that only the size of the sample, and not the poplation, matters? Well, let s retrn one last time to that pot of sop. If yo re cooking for a banqet rather than jst for a few people, yor pot will be bigger, bt do yo need a bigger spoon to decide how the sop tastes? Of corse not. The same-size spoonfl is probably enogh to make a decision abot the entire pot, no matter how large the pot. The fraction of the poplation that yo ve sampled doesn t matter. 4 It s the sample size itself that s important. How big a sample do yo need? That depends on what yo re estimating. To get an idea of what s really in the sop, yo ll need a large enogh taste to get a representative sample from the pot. For a srvey that tries to find the proportion of the poplation falling into a category, yo ll sally need several hndred respondents to say anything precise enogh to be sefl. 5 A friend who knows that yo are taking Statistics asks yor advice on her stdy. What can yo possibly say that will be helpfl? Jst say, If yo cold jst get a larger sample, it wold probably improve yor stdy. Even thogh a larger sample might not be worth the cost, it will almost always make the reslts more precise. Poplations and Samples. How well can a sample reveal the poplation s shape, center, and spread? Explore what happens as yo change the sample size. What do the pollsters do? How do professional polling agencies do their work? The most common polling method today is to contact respondents by telephone. Compters generate random telephone nmbers, so pollsters can even call some people with nlisted phone nmbers. The person who answers the phone is invited to respond to the srvey if that person qalifies. (For example, only if it s an adlt who lives at that address.) If the person answering doesn t qalify, the caller will ask for an appropriate alternative. In phrasing qestions, pollsters often list alternative responses (sch as candidates names) in different orders to avoid biases that might favor the first name on the list. Do these methods work? The Pew Research Center for the People and the Press, reporting on one srvey, says that Across five days of interviewing, srveys today are able to make some kind of contact with the vast majority of hoseholds (76%), and there is no decline in this contact rate over the past seven years. Bt becase of bsy schedles, skepticism and otright refsals, interviews were completed in jst 38% of hoseholds that were reached sing standard polling procedres. Nevertheless, stdies indicate that those actally sampled can give a good snapshot of larger poplations from which the srveyed hoseholds were drawn. Does a Censs Make Sense? Video: Frito-Lay Sampling for Qality. How does a potato chip manfactrer make sre to cook only the best potatoes? Why bother determining the right sample size? Woldn t it be better to jst inclde everyone and sample the entire poplation? Sch a special sample is called a censs. Althogh a censs wold appear to provide the best possible information abot the poplation, there are a nmber of reasons why it might not. First, it can be difficlt to complete a censs. Some individals in the poplation will be hard (and expensive) to locate. Or a censs might jst be impractical. If yo were a taste tester for the Hostess TM Company, yo probably woldn t want to censs all the Twinkies on the prodction line. Not only might this be lifeendangering, bt yo woldn t have any left to sell. 4 Well, that s not exactly tre. If the poplation is small enogh and the sample is more than 10% of the whole poplation, it can matter. It doesn t matter whenever, as sal, or sample is a very small fraction of the poplation. 5 Chapter 19 gives the details behind this statement and shows how to decide on a sample size for a srvey.

272 CHAPTER 12 Sample Srveys Second, poplations rarely stand still. In poplations of people, babies are born and folks die or leave the contry. In opinion srveys, events may case a shift in opinion dring the srvey. A censs takes longer to complete and the poplation changes while yo work. A sample srveyed in jst a few days may give more accrate information. Third, taking a censs can be more complex than sampling. For example, the U.S. Censs records too many college stdents. Many are conted once with their families and are then conted a second time in a report filed by their schools. The ndercont. It s particlarly difficlt to compile a complete censs of a poplation as large, complex, and spread ot as the U.S. poplation. The U.S. Censs is known to miss some residents. On occasion, the ndercont has been striking. For example, there have been blocks in inner cities in which the nmber of residents recorded by the Censs was smaller than the nmber of electric meters for which bills were being paid. What makes the problem particlarly important is that some grops have a higher probability of being missed than others ndocmented immigrants, the homeless, the poor. The Censs Brea proposed the se of random sampling to estimate the nmber of residents missed by the ordinary censs. Unfortnately, the reslting debate has become more political than statistical. Any qantity that we calclate from data cold be called a statistic. Bt in practice, we sally se a statistic to estimate a poplation parameter. Poplations and Parameters Activity: Statistics and Parameters. Explore the difference between statistics and parameters. Remember: Poplation model parameters are not jst nknown sally they are nknowable. We have to settle for sample statistics. A stdy fond that teens were less likely to bckle p. The National Center for Chronic Disease Prevention and Health Promotion reports that 21.7% of U.S. teens never or rarely wear seatbelts. We re sre they didn t take a censs, so what does the 21.7% mean? We can t know what percentage of teenagers wear seatbelts. Reality is jst too complex. Bt we can simplify the qestion by bilding a model. Models se mathematics to represent reality. Parameters are the key nmbers in those models. A parameter sed in a model for a poplation is sometimes called (redndantly) a poplation parameter. Bt let s not forget abot the data. We se smmaries of the data to estimate the poplation parameters. As we know, any smmary fond from the data is a statistic. Sometimes yo ll see the (also redndant) term sample statistic. 6 We ve already met two parameters in Chapter 6: the mean, m, and the standard deviation, s. We ll try to keep denoting poplation model parameters with Greek letters and the corresponding statistics with Latin letters. Usally, bt not always, the letter sed for the statistic and the parameter correspond in a natral way. So the standard deviation of the data is s, and the corresponding parameter is s (Greek for s). In Chapter 7, we sed r to denote the sample correlation. The corresponding correlation in a model for the poplation wold be called r (rho). In Chapter 8, b 1 represented the slope of a linear regression estimated from the data. Bt when we think abot a (linear) model for the poplation, we denote the slope parameter b 1 (beta). Get the pattern? Good. Now it breaks down. We denote the mean of a poplation model with m (becase m is the Greek letter for m). It might make sense to denote the sample mean with m, bt long-standing convention is to pt a bar over anything when we average it, so we write y. What abot proportions? Sppose we want to talk abot the proportion of teens who don t wear seatbelts. If we se p to denote the proportion from the data, what is the corresponding model parameter? By all rights it shold be p. Bt statements like p = 0.25 might be confsing becase p has been eqal to 3.1415926... for so long, and it s worked so well. So, once again we violate the rle. We ll se p for the poplation model 6 Where else besides a sample cold a statistic come from?

Simple Random Samples 273 parameter and pn for the proportion from the data (since, like yn in regression, it s an estimated vale). Here s a table smmarizing the notation: NOTATION ALERT: This entire table is a notation alert. Name Statistic Parameter Mean y m (m, prononced meeoo, not moo ) Standard deviation s s (sigma) Correlation r r (rho) Regression coefficient b b (beta, prononced baytah 7 ) Proportion pn p (prononced pee 8 ) We draw samples becase we can t work with the entire poplation, bt we want the statistics we compte from a sample to reflect the corresponding parameters accrately. A sample that does this is said to be representative. A biased sampling methodology tends to over- or nderestimate the parameter of interest. JUST CHECKING 1. Varios claims are often made for srveys. Why is each of the following claims not correct? a) It is always better to take a censs than to draw a sample. b) Stopping stdents on their way ot of the cafeteria is a good way to sample if we want to know abot the qality of the food there. c) We drew a sample of 100 from the 3000 stdents in a school. To get the same level of precision for a town of 30,000 residents, we ll need a sample of 1000. d) A poll taken at a statistics spport Web site garnered 12,357 responses. The majority said they enjoy doing statistics homework. With a sample size that large, we can be pretty sre that most Statistics stdents feel this way, too. e) The tre percentage of all Statistics stdents who enjoy the homework is called a poplation statistic. Simple Random Samples How wold yo select a representative sample? Most people wold say that every individal in the poplation shold have an eqal chance to be selected, and certainly that seems fair. Bt it s not sfficient. There are many ways to give everyone an eqal chance that still woldn t give a representative sample. Consider, for example, a school that has eqal nmbers of males and females. We cold sample like this: Flip a coin. If it comes p heads, select 100 female stdents at random. If it comes p tails, select 100 males at random. Everyone has an eqal chance of selection, bt every sample is of only a single sex hardly representative. We need to do better. Sppose we insist that every possible sample of the size we plan to draw has an eqal chance to be selected. This ensres that sitations like the one jst described are not likely to occr and still garantees that each person has an eqal chance of being selected. What s different is that with this method, each combination of people has an eqal chance of being selected as well. A sample drawn in this way is called a Simple Random Sample, sally abbreviated SRS. An SRS is the standard against which we measre other sampling methods, and the sampling method on which the theory of working with sampled data is based. To select a sample at random, we first need to define where the sample will come from. The sampling frame is a list of individals from which the sample is drawn. 7 If yo re from the United States. If yo re British or Canadian, it s beetah. 8 Jst in case yo weren t sre.

274 CHAPTER 12 Sample Srveys For example, to draw a random sample of stdents at a college, we might obtain a list of all registered fll-time stdents and sample from that list. In defining the sampling frame, we mst deal with the details of defining the poplation. Are part-time stdents inclded? How abot those who are attending school elsewhere and transferring credits back to the college? Once we have a sampling frame, the easiest way to choose an SRS is to assign a random nmber to each individal in the sampling frame. We then select only those whose random nmbers satisfy some rle. 9 Let s look at some ways to do this. FOR EXAMPLE Using random nmbers to get an SRS There are 80 stdents enrolled in an introdctory Statistics class; yo are to select a sample of 5. Qestion: How can yo select an SRS of 5 stdents sing these random digits fond on the Internet: 05166 29305 77482? First I ll nmber the stdents from 00 to 79. Taking the random nmbers two digits at a time gives me 05, 16, 62, 93, 05, 77, and 48. I ll ignore 93 becase the stdents were nmbered only p to 79. And, so as not to pick the same person twice, I ll skip the repeated nmber 05. My simple random sample consists of stdents with the nmbers 05, 16, 62, 77, and 48. Error Okay, Bias Bad! Sampling variability is sometimes referred to as sampling error, making it sond like it s some kind of mistake. It s not. We nderstand that samples will vary, so sampling error is to be expected. It s bias we mst strive to avoid. Bias means or sampling method distorts or view of the poplation, and that will srely lead to mistakes. We can be more efficient when we re choosing a larger sample from a sampling frame stored in a data file. First we assign a random nmber with several digits (say, from 0 to 10,000) to each individal. Then we arrange the random nmbers in nmerical order, keeping each name with its nmber. Choosing the first n names from this re-arranged list will give s a random sample of that size. Often the sampling frame is so large that it wold be too tedios to nmber everyone consectively. If or intended sample size is approximately 10% of the sampling frame, we can assign each individal a single random digit 0 to 9. Then we select only those with a specific random digit, say, 5. Samples drawn at random generally differ one from another. Each draw of random nmbers selects different people for or sample. These differences lead to different vales for the variables we measre. We call these sample-to-sample differences sampling variability. Srprisingly, sampling variability isn t a problem; it s an opportnity. In ftre chapters we ll investigate what the variation in a sample can tell s abot its poplation. Stratified Sampling Simple random sampling is not the only fair way to sample. More complicated designs may save time or money or help avoid sampling problems. All statistical sampling designs have in common the idea that chance, rather than hman choice, is sed to select the sample. Designs that are sed to sample from large poplations especially poplations residing across large areas are often more complicated than simple random samples. Sometimes the poplation is first sliced into homogeneos grops, called strata, before the sample is selected. Then simple random sampling is sed within each stratm before the reslts are combined. This common sampling design is called stratified random sampling. Why wold we want to complicate things? Here s an example. Sppose we want to learn how stdents feel abot fnding for the football team at a large 9 Chapter 11 presented ways of finding and working with random nmbers.

Clster and Mltistage Sampling 275 niversity. The camps is 60% men and 40% women, and we sspect that men and women have different views on the fnding. If we se simple random sampling to select 100 people for the srvey, we cold end p with 70 men and 30 women or 35 men and 65 women. Or reslting estimates of the level of spport for the football fnding cold vary widely. To help redce this sampling variability, we can decide to force a representative balance, selecting 60 men at random and 40 women at random. This wold garantee that the proportions of men and women within or sample match the proportions in the poplation, and that shold make sch samples more accrate in representing poplation opinion. Yo can imagine the importance of stratifying by race, income, age, and other characteristics, depending on the qestions in the srvey. Samples taken within a stratm vary less, so or estimates can be more precise. This redced sampling variability is the most important benefit of stratifying. Stratified sampling can also help s notice important differences among grops. As we saw in Chapter 3, if we nthinkingly combine grop data, we risk reaching the wrong conclsion, becoming victims of Simpson s paradox. FOR EXAMPLE Stratifying the sample Recap: Yo re trying to find ot what freshmen think of the food served on camps. Food Services believes that men and women typically have different opinions abot the importance of the salad bar. Qestion: How shold yo adjst yor sampling strategy to allow for this difference? I will stratify my sample by drawing an SRS of men and a separate SRS of women assming that the data from the registrar inclde information abot each person s sex. Clster and Mltistage Sampling Sppose we wanted to assess the reading level of this textbook based on the length of the sentences. Simple random sampling cold be awkward; we d have to nmber each sentence, then find, for example, the 576th sentence or the 2482nd sentence, and so on. Doesn t sond like mch fn, does it? It wold be mch easier to pick a few pages at random and cont the lengths of the sentences on those pages. That works if we believe that each page is representative of the entire book in terms of reading level. Splitting the poplation into representative clsters can make sampling more practical. Then we cold simply select one or a few clsters at random and perform a censs within each of them. This sampling design is called clster sampling. If each clster represents the fll poplation fairly, clster sampling will be nbiased. FOR EXAMPLE Clster sampling Recap: In trying to find ot what freshmen think abot the food served on camps, yo ve considered both an SRS and a stratified sample. Now yo have rn into a problem: It s simply too difficlt and time consming to track down the individals whose names were chosen for yor sample. Fortnately, freshmen at yor school are all hosed in 10 freshman dorms. Qestions: How cold yo se this fact to draw a clster sample? How might that alleviate the problem? What concerns do yo have? To draw a clster sample, I wold select one or two dorms at random and then try to contact everyone in each selected dorm. I cold save time by simply knocking on doors on a given evening and interviewing people. I d have to assme that freshmen were assigned to dorms pretty mch at random and that the people I m able to contact are representative of everyone in the dorm.

276 CHAPTER 12 Sample Srveys What s the difference between clster sampling and stratified sampling? We stratify to ensre that or sample represents different grops in the poplation, and we sample randomly within each stratm. Strata are internally homogeneos, bt differ from one another. By contrast, clsters are internally heterogeneos, each resembling the overall poplation. We select clsters to make sampling more practical or affordable. Stratified vs. clster sampling. Boston cream pie consists of a layer of yellow cake, a layer of pastry creme, another cake layer, and then a chocolate frosting. Sppose yo are a professional taster (yes, there really are sch people) whose job is to check yor company s pies for qality. Yo d need to eat small samples of randomly selected pies, tasting all three components: the cake, the creme, and the frosting. One approach is to ct a thin vertical slice ot of the pie. Sch a slice will be a lot like the entire pie, so by eating that slice, yo ll learn abot the whole pie. This vertical slice containing all the different ingredients in the pie wold be a clster sample. Another approach is to sample in strata: Select some tastes of the cake at random, some tastes of creme at random, and some bits of frosting at random. Yo ll end p with a reliable jdgment of the pie s qality. Many poplations yo might want to learn abot are like this Boston cream pie. Yo can think of the sbpoplations of interest as horizontal strata, like the layers of pie. Clster samples slice vertically across the layers to obtain clsters, each of which is representative of the entire poplation. Stratified samples represent the poplation by drawing some from each layer, redcing variability in the reslts that cold arise becase of the differences among the layers. Strata or Clsters? We may split a poplation into strata or clsters. What s the difference? We create strata by dividing the poplation into grops of similar individals so that each stratm is different from the others. By contrast, since clsters each represent the entire poplation, they all look pretty mch alike. Sometimes we se a variety of sampling methods together. In trying to assess the reading level of this book, we might worry that it starts ot easy and then gets harder as the concepts become more difficlt. If so, we d want to avoid samples that selected heavily from early or from late chapters. To garantee a fair mix of chapters, we cold randomly choose one chapter from each of the seven parts of the book and then randomly select a few pages from each of those chapters. If, altogether, that made too many sentences, we might select a few sentences at random from each of the chosen pages. So, what is or sampling strategy? First we stratify by the part of the book and randomly choose a chapter to represent each stratm. Within each selected chapter, we choose pages as clsters. Finally, we consider an SRS of sentences within each clster. Sampling schemes that combine several methods are called mltistage samples. Most srveys condcted by professional polling organizations se some combination of stratified and clster sampling as well as simple random samples. FOR EXAMPLE Mltistage sampling Recap: Having learned that freshmen are hosed in separate dorms allowed yo to sample their attitdes abot the camps food by going to dorms chosen at random, bt yo re still concerned abot possible differences in opinions between men and women. It trns ot that these freshmen dorms hose the sexes on alternate floors. Qestion: How can yo design a sampling plan that ses this fact to yor advantage? Now I can stratify my sample by sex. I wold first choose one or two dorms at random and then select some dorm floors at random from among those that hose men and, separately, from among those that hose women. I cold then treat each floor as a clster and interview everyone on that floor.

Systematic Samples 277 Systematic Samples Some samples select individals systematically. For example, yo might srvey every 10th person on an alphabetical list of stdents. To make it random, yo still mst start the systematic selection from a randomly selected individal. When the order of the list is not associated in any way with the responses soght, systematic sampling can give a representative sample. Systematic sampling can be mch less expensive than tre random sampling. When yo se a systematic sample, yo shold jstify the assmption that the systematic method is not associated with any of the measred variables. Think abot the reading-level sampling example again. Sppose we have chosen a chapter of the book at random, then three pages at random from that chapter, and now we want to select a sample of 10 sentences from the 73 sentences fond on those pages. Instead of nmbering each sentence so we can pick a simple random sample, it wold be easier to sample systematically. A qick calclation shows 73>10 = 7.3, so we can get or sample by jst picking every seventh sentence on the page. Bt where shold yo start? At random, of corse. We ve acconted for 10 * 7 = 70 of the sentences, so we ll throw the extra 3 into the starting grop and choose a sentence at random from the first 10. Then we pick every seventh sentence after that and record its length. JUST CHECKING 2. We need to srvey a random sample of the 300 passengers on a flight from San Francisco to Tokyo. Name each sampling method described below. a) Pick every 10th passenger as people board the plane. b) From the boarding list, randomly choose 5 people flying first class and 25 of the other passengers. c) Randomly generate 30 seat nmbers and srvey the passengers who sit there. d) Randomly select a seat position (right window, right center, right aisle, etc.) and srvey all the passengers sitting in those seats. STEP-BY-STEP EXAMPLE Sampling The assignment says, Condct yor own sample srvey to find ot how many hors per week stdents at yor school spend watching TV dring the school year. Let s see how we might do this step by step. (Remember, thogh actally collecting the data from yor sample can be difficlt and time consming.) Qestion: How wold yo design this srvey? Plan State what yo want to know. Poplation and Parameter Identify the W s of the stdy. The Why determines the poplation and the associated sampling frame. The What identifies the parameter of interest and the variables measred. The Who is the sample we actally draw. The How, When, and Where are given by the sampling plan. I wanted to design a stdy to find ot how many hors of TV stdents at my school watch. The poplation stdied was stdents at or school. I obtained a list of all stdents crrently enrolled and sed it as the sampling frame. The parameter of interest was the nmber of TV hors watched per week dring the school year, which I attempted to measre by asking stdents how mch TV they watched dring the previos week.

278 CHAPTER 12 Sample Srveys Often, thinking abot the Why will help s see whether the sampling frame and plan are adeqate to learn abot the poplation. Sampling Plan Specify the sampling method and the sample size, n. Specify how the sample was actally drawn. What is the sampling frame? How was the randomization performed? A good description shold be complete enogh to allow someone to replicate the procedre, drawing another sample from the same poplation in the same manner. I decided against stratifying by class or sex becase I didn t think TV watching wold differ mch between males and females or across classes. I selected a simple random sample of stdents from the list. I obtained an alphabetical list of stdents, assigned each a random digit between 0 and 9, and then selected all stdents who were assigned a 4. This method generated a sample of 212 stdents from the poplation of 2133 stdents. Sampling Practice Specify When, Where, and How the sampling was performed. Specify any other details of yor srvey, sch as how respondents were contacted, what incentives were offered to encorage them to respond, how nonrespondents were treated, and so on. The srvey was taken over the period Oct. 15 to Oct. 25. Srveys were sent to selected stdents by e-mail, with the reqest that they respond by e-mail as well. Stdents who cold not be reached by e-mail were handed the srvey in person. Smmary and Conclsion This report shold inclde a discssion of all the elements. In addition, it s good practice to discss any special circmstances. Professional polling organizations report the When of their samples bt will also note, for example, any important news that might have changed respondents opinions dring the sampling process. In this srvey, perhaps, a major news story or sporting event might change stdents TV viewing behavior. The qestion yo ask also matters. It s better to be specific ( How many hors did yo watch TV last week? ) than to ask a general qestion ( How many hors of TV do yo sally watch in a week? ). Dring the period Oct. 15 to Oct. 25, 212 stdents were randomly selected, sing a simple random sample from a list of all stdents crrently enrolled. The srvey they received asked the following qestion: How many hors did yo spend watching television last week? Of the 212 stdents srveyed, 110 responded. It s possible that the nonrespondents differ in the nmber of TV hors watched from those who responded, bt I was nable to follow p on them de to limited time and fnds. The 110 respondents reported an average 3.62 hors of TV watching per week. The median was only 2 hors per week. A histogram of the data shows that the distribtion is highly rightskewed, indicating that the median might be a more appropriate smmary of the typical TV watching of the stdents.

Defining the Who : Yo Can t Always Get What Yo Want 279 The report shold show a display of the data, provide and interpret the statistics from the sample, and state the conclsions that yo reached abot the poplation. # of Stdents 50 40 30 20 10 0.00 6.25 12.50 18.75 25.00 TV Watched per Week (hr) Most of the stdents (90%) watch between 0 and 10 hors per week, while 30% reported watching less than 1 hor per week. A few watch mch more. Abot 3% reported watching more than 20 hors per week. Defining the Who : Yo Can t Always Get What Yo Want The poplation is determined by the Why of the stdy. Unfortnately, the sample is jst those we can reach to obtain responses the Who of the stdy.this difference cold ndermine even a well-designed stdy. Before yo start a srvey, think first abot the poplation yo want to stdy. Yo may find that it s not the well-defined grop yo thoght it was. Who, exactly, is a stdent, for example? Even if the poplation seems well defined, it may not be a practical grop from which to draw a sample. For example, election polls want to sample from all those who will vote in the next election a poplation that is impossible to identify before Election Day. Next, yo mst specify the sampling frame. (Do yo have a list of stdents to sample from? How abot a list of registered voters?) Usally, the sampling frame is not the grop yo really want to know abot. (All those registered to vote are not eqally likely to show p.) The sampling frame limits what yor srvey can find ot. Then there s yor target sample. These are the individals for whom yo intend to measre responses. Yo re not likely to get responses from all of them. ( I know it s dinnertime, bt I m sre yo woldn t mind answering a few qestions. It ll only take 20 mintes or so. Oh, yo re bsy? ) Nonresponse is a problem in many srveys. Finally, there s yor sample the actal respondents. These are the individals abot whom yo do get data and can draw conclsions. Unfortnately, they might not be representative of the sampling frame or the poplation. CALVIN AND HOBBES 1993 Watterson. Reprinted with permission of Universal Press Syndicate. All rights reserved.

280 CHAPTER 12 Sample Srveys At each step, the grop we can stdy may be constrained frther. The Who keeps changing, and each constraint can introdce biases. A carefl stdy shold address the qestion of how well each grop matches the poplation of interest. One of the main benefits of simple random sampling is that it never loses its sense of who s Who. The Who in an SRS is the poplation of interest from which we ve drawn a representative sample. That s not always tre for other kinds of samples. The Valid Srvey It isn t sfficient to jst draw a sample and start asking qestions. We ll want or srvey to be valid. A valid srvey yields the information we are seeking abot the poplation we are interested in. Before setting ot to srvey, ask yorself: What do I want to know? Am I asking the right respondents? Am I asking the right qestions? What wold I do with the answers if I had them; wold they address the things I want to know? These qestions may sond obvios, bt there are a nmber of pitfalls to avoid. Know what yo want to know. Before considering a srvey, nderstand what yo hope to learn and abot whom yo hope to learn it. Far too often, people decide to perform a srvey withot any clear idea of what they hope to learn. Use the right frame. A valid srvey obtains responses from the appropriate respondents. Be sre yo have a sitable sampling frame. Have yo identified the poplation of interest and sampled from it appropriately? A company might srvey cstomers who retrned warranty registration cards, a readily available sampling frame. Bt if the company wants to know how to make their prodct more attractive, the most important poplation is the cstomers who rejected their prodct in favor of one from a competitor. Tne yor instrment. It is often tempting to ask qestions yo don t really need, bt beware longer qestionnaires yield fewer responses and ths a greater chance of nonresponse bias. Ask specific rather than general qestions. People are not very good at estimating their typical behavior, so it is better to ask How many hors did yo sleep last night? than How mch do yo sally sleep? Sre, some responses will inclde some nsal events (My dog was sick; I was p all night.), bt overall yo ll get better data. Ask for qantitative reslts when possible. How many magazines did yo read last week? is better than How mch do yo read: A lot, A moderate amont, A little, or None at all? Be carefl in phrasing qestions. A respondent may not nderstand the qestion or may nderstand the qestion differently than the researcher intended it. ( Does anyone in yor family belong to a nion? Do yo mean jst me, my spose, and my children? Or does family inclde my father, my siblings, and my second cosin once removed? What abot my grandfather, who is staying with s? I think he once belonged to the Atoworkers Union.) Respondents are nlikely (or may not have the opportnity) to ask for clarification. A qestion like Do yo approve of the recent actions of the Secretary of Labor? is likely not to measre what yo want if many re-

The Valid Srvey 281 spondents don t know who the Secretary of Labor is or what actions he or she recently made. Respondents may even lie or shade their responses if they feel embarrassed by the qestion ( Did yo have too mch to drink last night? ), are intimidated or inslted by the qestion ( Cold yo nderstand or new Instrctions for Dmmies manal, or was it too difficlt for yo? ), or if they want to avoid offending the interviewer ( Wold yo hire a man with a tattoo? asked by a tattooed interviewer). Also, be carefl to avoid phrases that have doble or regional meanings. How often do yo go to town? might be interpreted differently by different people and cltres. Even sbtle differences in phrasing can make a difference. In Janary 2006, the New York Times asked half of the 1229 U.S. adlts in their sample the following qestion: After 9/11, President Bsh athorized government wiretaps on some phone calls in the U.S. withot getting cort warrants, saying this was necessary to redce the threat of terrorism. Do yo approve or disapprove of this? They fond that 53% of respondents approved. Bt when they asked the other half of their sample a qestion with only slightly different phrasing, After 9/11, George W. Bsh athorized government wiretaps on some phone calls in the U.S. withot getting cort warrants. Do yo approve or disapprove of this? only 46% approved. Be carefl in phrasing answers. It s often a good idea to offer choices rather than inviting a free response. Open-ended answers can be difficlt to analyze. How did yo like the movie? may start an interesting debate, bt it may be better to give a range of possible responses. Be sre to phrase them in a netral way. When asking Do yo spport higher school taxes? positive responses cold be worded Yes, Yes, it is important for or children, or Yes, or ftre depends on it. Bt those are not eqivalent answers. The best way to protect a srvey from sch nanticipated measrement errors is to perform a pilot srvey. A pilot is a trial rn of the srvey yo eventally plan to give to a larger grop, sing a draft of yor srvey qestions administered to a small sample drawn from the same sampling frame yo intend to se. By analyzing the reslts from this smaller srvey, yo can often discover ways to improve yor instrment.

282 CHAPTER 12 Sample Srveys WHAT CAN GO WRONG? OR, HOW TO SAMPLE BADLY Bad sample designs yield worthless data. Many of the most convenient forms of sampling can be seriosly biased. And there is no way to correct for the bias from a bad sample. So it s wise to pay attention to sample design and to beware of reports based on poor samples. Sample Badly with Volnteers One of the most common dangeros sampling methods is a volntary response sample. In a volntary response sample, a large grop of individals is invited to respond, and all who do respond are conted. This method is sed by call-in shows, 900 nmbers, Internet polls, and letters written to members of Congress. Volntary response samples are almost always biased, and so conclsions drawn from them are almost always wrong. It s often hard to define the sampling frame of a volntary response stdy. Practically, the frames are grops sch as Internet sers who freqent a particlar Web site or those who happen to be watching a particlar TV show at the moment. Bt those sampling frames don t correspond to interesting poplations. Even within the sampling frame, volntary response samples are often biased toward those with strong opinions or those who are strongly motivated. People with very negative opinions tend to respond more often than those with eqally strong positive opinions. The sample is not representative, even thogh every individal in the poplation may have been offered the chance to respond. The reslting volntary response bias invalidates the srvey. Activity: Sorces of Sampling Bias. Here s a narrated exploration of sampling bias. If yo had it to do over again, wold yo have children? Ann Landers, the advice colmnist, asked parents this qestion. The overwhelming majority 70% of the more than 10,000 people who wrote in said no, kids weren t worth it. A more careflly designed srvey later showed that abot 90% of parents actally are happy with their decision to have children. What acconts for the striking difference in these two reslts? What parents do yo think are most likely to respond to the original qestion? FOR EXAMPLE Bias in sampling Recap: Yo re trying to find ot what freshmen think of the food served on camps, and have thoght of a variety of sampling methods, all time consming. A friend sggests that yo set p a Tell Us What Yo Think Web site and invite freshmen to visit the site to complete a qestionnaire. Qestion: What s wrong with this idea? Letting each freshman decide whether to participate makes this a volntary response srvey. Stdents who were dissatisfied might be more likely to go to the Web site to record their complaints, and this cold give me a biased view of the opinions of all freshmen. Do yo se the Internet? Click here for yes Click here for no Sample Badly, bt Conveniently Another sampling method that doesn t work is convenience sampling. As the name sggests, in convenience sampling we simply inclde the individals who are convenient for s to sample. Unfortnately, this grop may not be representative of the poplation. A recent srvey of 437 potential home byers in Orange Conty, California, fond, among other things, that

What Else Can Go Wrong? 283 Internet convenience srveys are worthless. As volntary response srveys, they have no well-defined sampling frame (all those who se the Internet and visit their site?) and ths report no sefl information. Do not believe them. All bt 2 percent of the byers have at least one compter at home, and 62 percent have two or more. Of those with a compter, 99 percent are connected to the Internet (Jennifer Hieger, Portrait of Homebyer Hosehold: 2 Kids and a PC, Orange Conty Register, 27 Jly 2001). Later in the article, we learn that the srvey was condcted via the Internet! That was a convenient way to collect data and srely easier than drawing a simple random sample, bt perhaps home bilders sholdn t conclde from this stdy that every family has a compter and an Internet connection. Many srveys condcted at shopping malls sffer from the same problem. People in shopping malls are not necessarily representative of the poplation of interest. Mall shoppers tend to be more afflent and inclde a larger percentage of teenagers and retirees than the poplation at large. To make matters worse, srvey interviewers tend to select individals who look safe, or easy to interview. FOR EXAMPLE Bias in sampling Recap: To try to gage freshman opinion abot the food served on camps, Food Services sggests that yo jst stand otside a school cafeteria at lnchtime and stop people to ask them qestions. Qestions: What s wrong with this sampling strategy? This wold be a convenience sample, and it s likely to be biased. I wold miss people who se the cafeteria for dinner, bt not for lnch, and I d never hear from anyone who hates the food so mch that they have stopped coming to the school cafeterias. Sample from a Bad Sampling Frame An SRS from an incomplete sampling frame introdces bias becase the individals inclded may differ from the ones not in the frame. People in prison, homeless people, stdents, and long-term travelers are all likely to be missed. In telephone srveys, people who have only cell phones or who se VOIP Internet phones are often missing from the sampling frame. Undercoverage Many srvey designs sffer from ndercoverage, in which some portion of the poplation is not sampled at all or has a smaller representation in the sample than it has in the poplation. Undercoverage can arise for a nmber of reasons, bt it s always a potential sorce of bias. Telephone srveys are sally condcted when yo are likely to be home, interrpting yor dinner. If yo eat ot often, yo may be less likely to be srveyed, a possible sorce of ndercoverage. Else CAN GO WRONG? WHAT^ Watch ot for nonrespondents. A common and serios potential sorce of bias for most srveys is nonresponse bias. No srvey scceeds in getting responses from everyone. The problem is that those who don t respond may differ from those who do. And they may differ on jst the variables we care abot. The lack of response will (contined)

284 CHAPTER 12 Sample Srveys bias the reslts. Rather than sending ot a large nmber of srveys for which the response rate will be low, it is often better to design a smaller randomized srvey for which yo have the resorces to ensre a high response rate. One of the problems with nonresponse bias is that it s sally impossible to tell what the nonrespondents might have said. Video: Biased Qestion Wording. Watch a hapless interviewer make every mistake in the book. A Short Srvey Given the fact that those who nderstand Statistics are smarter and better looking than those who don t, don t yo think it is important to take a corse in Statistics? Remember the Literary Digest Srvey? It trns ot that they were wrong on two conts. First, their list of 10 million people was not representative. There was a selection bias in their sampling frame. There was also a nonresponse bias. We know this becase the Digest also srveyed a systematic sample in Chicago, sending the same qestion sed in the larger srvey to every third registered voter. They still got a reslt in favor of Landon, even thogh Chicago voted overwhelmingly for Roosevelt in the election. This sggests that the Roosevelt spporters were less likely to respond to the Digest srvey. There s a modern version of this problem: It s been sggested that those who screen their calls with caller ID or an answering machine, and so might not talk to a pollster, may differ in wealth or political views from those who jst answer the phone. Work hard to avoid inflencing responses. Response bias 10 refers to anything in the srvey design that inflences the responses. Response biases inclde the tendency of respondents to tailor their responses to try to please the interviewer, the natral nwillingness of respondents to reveal personal facts or admit to illegal or napproved behavior and the ways in which the wording of the qestions can inflence responses. Activity: Can a Large Sample Protect Against Bias? Explore how we can learn abot the poplation from large or repeated samples. A researcher distribted a srvey to an organization before some economizing changes were made. She asked how people felt abot a proposed ctback in secretarial and administrative spport on a seven-point scale from Very Happy to Very Unhappy. Bt virtally all respondents were very nhappy abot the ctbacks, so the reslts weren t particlarly sefl. If she had pretested the qestion, she might have chosen a scale that ran from Unhappy to Otraged. How to Think Abot Biases Look for biases in any srvey yo enconter. If yo design one of yor own, ask someone else to help look for biases that may not be obvios to yo. And do this before yo collect yor data. There s no way to recover from a biased sampling method or a srvey that asks biased qestions. Sorry, it jst can t be done. A bigger sample size for a biased stdy jst gives yo a bigger seless stdy. A really big sample gives yo a really big seless stdy. (Think of the 2.4 million Literary Digest responses.) Spend yor time and resorces redcing biases. No other se of resorces is as worthwhile as redcing the biases. If yo can, pilot-test yor srvey. Administer the srvey in the exact form that yo intend to se it to a small sample drawn from the poplation yo intend to sample. Look for misnderstandings, misinterpretation, confsion, or other possible biases. Then refine yor srvey instrment. Always report yor sampling methods in detail. Others may be able to detect biases where yo did not expect to find them. 10 Response bias is not the opposite of nonresponse bias. (We don t make these terms p; we jst try to explain them.)

What Have We Learned? 285 CONNECTIONS With this chapter, we take or first formal steps to relate or sample data to a larger poplation. Some of these ideas have been lrking in the backgrond as we soght patterns and smmaries for data. Even when we only worked with the data at hand, we often thoght abot implications for a larger poplation of individals. Notice the ongoing central importance of models. We ve seen models in several ways in previos chapters. Here we recognize the vale of a model for a poplation. The parameters of sch a model are vales we will often want to estimate sing statistics sch as those we ve been calclating. The connections to smmary statistics for center, spread, correlation, and slope are obvios. We now have a specific application for random nmbers. The idea of applying randomness deliberately showed p in Chapter 11 for simlation. Now we need randomization to get goodqality data from the real world. WHAT HAVE WE LEARNED? We ve learned that a representative sample can offer s important insights abot poplations. It s the size of the sample and not its fraction of the larger poplation that determines the precision of the statistics it yields. We ve learned several ways to draw samples, all based on the power of randomness to make them representative of the poplation of interest: A Simple Random Sample (SRS) is or standard. Every possible grop of n individals has an eqal chance of being or sample. That s what makes it simple. Stratified samples can redce sampling variability by identifying homogeneos sbgrops and then randomly sampling within each. Clster samples randomly select among heterogeneos sbgrops that each resemble the poplation at large, making or sampling tasks more manageable. Systematic samples can work in some sitations and are often the least expensive method of sampling. Bt we still want to start them randomly. Mltistage samples combine several random sampling methods. We ve learned that bias can destroy or ability to gain insights from or sample: Nonresponse bias can arise when sampled individals will not or cannot respond. Response bias arises when respondents answers might be affected by external inflences, sch as qestion wording or interviewer behavior. We ve learned that bias can also arise from poor sampling methods: Volntary response samples are almost always biased and shold be avoided and distrsted. Convenience samples are likely to be flawed for similar reasons. Even with a reasonable design, sample frames may not be representative. Undercoverage occrs when individals from a sbgrop of the poplation are selected less often than they shold be. Finally, we ve learned to look for biases in any srvey we find and to be sre to report or methods whenever we perform a srvey so that others can evalate the fairness and accracy of or reslts. Terms Poplation Sample 268. The entire grop of individals or instances abot whom we hope to learn. 268. A (representative) sbset of a poplation, examined in hope of learning abot the poplation.