Sampling: What you don t know can hurt you. Juan Muñoz

Similar documents
Centre for Economic Policy Research

3rd takes a long time/costly difficult to ensure whole population surveyed cannot be used if the measurement process destroys the item

Reliability. What We Will Cover. What Is It? An estimate of the consistency of a test score.

PPM Panels: A Guidebook for Arbitron Authorized Users

BARB Establishment Survey Annual Data Report: Volume 1 Total Network and Appendices

Modeling memory for melodies

BARB Establishment Survey Quarterly Data Report: Total Network

Sample Design and Weighting Procedures for the BiH STEP Employer Survey. David J. Megill Sampling Consultant, World Bank May 2017

Before the Federal Communications Commission Washington, D.C ) ) ) ) ) ) ) ) ) REPORT ON CABLE INDUSTRY PRICES

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

expressed on operational issues are those of the authors and not necessarily those of the U.S. Census Bureau.

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Comparative Study of Electoral Systems (CSES) Module 3: Sample Design and Data Collection Report June 05, 2006

Quantitative methods

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

How Large a Sample? CHAPTER 24. Issues in determining sample size

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

The Relationship Between Movie theater Attendance and Streaming Behavior. Survey Findings. December 2018

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Tutorial 0: Uncertainty in Power and Sample Size Estimation. Acknowledgements:

Comparison of Mixed-Effects Model, Pattern-Mixture Model, and Selection Model in Estimating Treatment Effect Using PRO Data in Clinical Trials

EXECUTIVE REPORT. All Media Survey 2012 (2)

Most Canadians think the Prime Minister s trip to India was not a success

Objective: Write on the goal/objective sheet and give a before class rating. Determine the types of graphs appropriate for specific data.

Sampling Worksheet: Rolling Down the River

Core ICT indicators on access to, and use of, ICTs by households and individuals

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

The Proportion of NUC Pre-56 Titles Represented in OCLC WorldCat

AUDIOVISUAL COMMUNICATION

Towards a Stratified Learning Approach to Predict Future Citation Counts

The Urbana Free Library Patron Survey. Final Report

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

Building Trust in Online Rating Systems through Signal Modeling

India Peoplemeter Update VII

Signal Survey Summary. submitted by Nanos to Signal Leadership Communication Inc., July 2018 (Submission )

A year later, Trudeau remains near post election high on perceptions of having the qualities of a good political leader

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

Canadians opinions on our connection to the monarchy

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

SUBMISSION AND GUIDELINES

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Partisanship and the Media: Personal Politics Affect Where People Go, What They Trust, and Whether They Pay

BER margin of COM 3dB

CONCLUSION The annual increase for optical scanner cost may be due partly to inflation and partly to special demands by the State.

unbiased , is zero. Yï) + iab Fuller and Burmeister [4] suggested the estimator: N =Na +Nb + Nab Na +NB =Nb +NA.

The Relationship Between Movie Theatre Attendance and Streaming Behavior. Survey insights. April 24, 2018

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

bwresearch.com twitter.com/bw_research facebook.com/bwresearch

MID-TERM EXAMINATION IN DATA MODELS AND DECISION MAKING 22:960:575

Minimax Disappointment Video Broadcasting

Software Engineering 2DA4. Slides 9: Asynchronous Sequential Circuits

Confidence Intervals for Radio Ratings Estimators

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11

Hybrid resampling methods for confidence intervals: comment

Processes for the Intersection

Impressions of Canadians on social media platforms and their impact on the news

Consumer aerial survey. Implementing Ofcom s UHF Strategy

NANOS. Trudeau sets yet another new high on the preferred PM tracking by Nanos

Time Domain Simulations

Department of Computer Science, Cornell University. fkatej, hopkik, Contact Info: Abstract:

Trudeau remains strong on preferred PM measure tracked by Nanos

Trudeau top choice as PM, unsure second and at a 12 month high

Almost seven in ten Canadians continue to think Trudeau has the qualities of a good political leader in Nanos tracking

Trudeau scores strongest on having the qualities of a good political leader

The Choice of Sampling Frequency and Product Acceptance Criteria to Assure Content Uniformity for Continuous Manufacturing Processes

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Positive trajectory for Trudeau continues hits a twelve month high on preferred PM and qualities of good political leader in Nanos tracking

NANOS. Trudeau first choice as PM, unsure scores second and at a three year high

A. Introduction 1. Title: Automatic Underfrequency Load Shedding Requirements

Trudeau hits 12 month high, Mulcair 12 month low in wake of Commons incident

GfK Audience Measurements & Insights FREQUENTLY ASKED QUESTIONS TV AUDIENCE MEASUREMENT IN THE KINGDOM OF SAUDI ARABIA

Preferred Ottawa Public Library hours of operation GenPop Survey Summary Document 3

2.1 Telephone Follow-up Procedure

Music Therapists Training Program by Hyogo Prefectural Administration

Testing Production Data Capture Quality

AN EXPERIMENT WITH CATI IN ISRAEL

AMERICAN NATIONAL STANDARD

THE UNIVERSITY OF QUEENSLAND

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Building a Better Bach with Markov Chains

MODELLING IMPLICATIONS OF SPLITTING EUC BAND 1

Honeymoon is on - Trudeau up in preferred PM tracking by Nanos

Special Article. Prior Publication Productivity, Grant Percentile Ranking, and Topic-Normalized Citation Impact of NHLBI Cardiovascular R01 Grants

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Pitch correction on the human voice

Data Representation. signals can vary continuously across an infinite range of values e.g., frequencies on an old-fashioned radio with a dial

The Impact of the DTV Transition on Consumers and Consumer Choice. Overview of the DTV Transition Situation

Knoxville External Video Survey: Background & Status Report

Relationships Between Quantitative Variables

IMPROVING SIGNAL DETECTION IN SOFTWARE-BASED FACIAL EXPRESSION ANALYSIS

hprints , version 1-1 Oct 2008

International Journal of Library and Information Studies. An User Satisfaction about Library Resources and Services: A Study

Open access press vs traditional university presses on Amazon

Simulation Supplement B

Resampling Statistics. Conventional Statistics. Resampling Statistics

Views on local news in the federal electoral district of Montmagny-L Islet-Kamouraska-Rivière-du-Loup

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Pittsburg State University THESIS MANUAL. Approved by the Graduate Council April 13, 2005

Transcription:

Sampling: What you don t know can hurt you Juan Muñoz

Probability sampling Also known as Scientific Sampling. Households are selected randomly. Each household in the population has a known, nonzero probability of being included in the sample.

Basic Sampling Techniques The three basic techniques of probability sampling: Simple Random Sampling Multi-stage Sampling Stratified Sampling Most household surveys use a combination of these three techniques.

Probability sampling Permits establishing sampling errors and confidence intervals. Other sampling procedures (purposive sampling, convenience sampling, quota sampling, etc.) cannot do that. Other sampling procedures can also yield biased conclusions.

Simple Random Sampling Households are selected independently. Every household in the population has an equal chance or probability of being selected in the sample. This probability is: p = n/n where n=the size of the sample. N=the size of the study population.

Simple Random Sampling Simple random sampling is almost never the only technique used in practice, because: A Sampling Frame may not be available, or it would be very large (a Sampling Frame is a list of all units in a study population that can be used to select a sample from. Fieldwork may be difficult since the selected households would be too scattered.

Simple Random Sampling Simple random sampling is almost never the only technique used in practice, but it is useful to illustrate some basic facts about sampling: Sampling errors and confidence intervals. The relationship between sampling error and sample size. The relationship between sampling error and population size. Sampling errors vs. non-sampling errors.

Sampling error and sample size Sampling error e when estimating a proportion p with a sample of size n taken from an infinite population e = p( 1 p) n

Confidence intervals In a sample of 1,000 enterprises, 280 enterprises (28 percent) have been harassed by a predatory agency. 0.28 0.72 e = = 0.0142 1,000 Sampling error is 1.42 percent.

Confidence intervals In a sample of 1,000 enterprises, 280 enterprises (28 percent) have been harassed by a predatory agency. Sampling error is 1.42 percent. Sampling error 24 25 26 27 28 29 30 31 32 95 percent confidence interval:28 ± 1.42 1.96 99 percent confidence interval: 28 ± 1.42 2.58

Sampling error and sample size Sampling error To halve sampling error......sample size must be quadrupled Sample size

Sample size and population size Sampling error e when estimating a proportion p with a sample of size n taken from a population of size N e = 1 n N p( 1 p) n finite population correction

Sample size and population size Sample size needed for a given precision Population size

Sampling vs. non-sampling errors Error Total error Non-sampling error Sampling error Sample size

Two-stage Sampling The population is divided up into subgroups, or Primary Sampling Units (PSUs), that represent aggregates of individual households. In the first stage, a sample of PSUs is selected. In the second stage, a sample of individual households is chosen in each of the selected PSUs.

Two-stage Sampling Solves the problems of Simple Random Sampling Provides an opportunity to link community-level factors to household behavior The sample can be made self-weighted if In the first stage, PSUs are selected with Probability Proportional to Size (PPS) In the second stage, a fixed number of households are chosen within the selected PSUs The price to pay is cluster effect

Cluster effect Sampling error grows when the sample of size n is drawn from k PSUs, with m households in each PSU (n=k m) Intra-cluster correlation coefficient e 2 = e 2 [ 1+ ρ( m 1)] corrected Cluster effect

Cluster effects For a total sample size of 12,000 households Number of PSUs Number of households per PSU Intra-cluster correlation coefficient 0.05 600 20 1.95

Cluster effects For a total sample size of 12,000 households Number of PSUs Number of households per PSU Intra-cluster correlation coefficient 0.01 0.02 0.05 0.10 600 20 1.19 1.38 1.95 2.90

Cluster effects For a total sample size of 12,000 households Number of PSUs 3000 2000 1500 1000 800 600 400 300 200 150 100 Number of households per PSU 4 6 8 12 15 20 30 40 60 80 120 Intra-cluster correlation coefficient 0.01 0.02 0.05 0.10 1.15 1.25 1.35 1.55 1.70 1.19 1.38 1.95 2.90 2.45 2.95 3.95 4.95 6.95

Cluster effects For a total sample size of 12,000 households Number of PSUs 3000 2000 1500 1000 800 600 400 300 200 150 100 Number of households per PSU 4 6 8 12 15 20 30 40 60 80 120 Intra-cluster correlation coefficient 0.01 0.02 0.05 0.10 1.03 1.06 1.15 1,30 1.05 1.10 1.25 1.50 1.07 1.14 1.35 1.70 1.11 1.22 1.55 2.10 1.14 1.28 1.70 2.40 1.19 1.38 1.95 2.90 1.29 1.58 2.45 3.90 1.39 1.78 2.95 4.90 1.59 2.18 3.95 6.90 1.79 2.58 4.95 8.90 2.19 3.38 6.95 12.9

Stratified Sampling The population is divided up into subgroups or strata. A separate sample of households is then selected from each strata.

Stratified Sampling There are two primary reasons for using a stratified sampling design: To potentially reduce sampling error by gaining greater control over the composition of the sample. To ensure that particular groups within a population are adequately represented in the sample. The two objectives are generally contradictory in practice.

Stratified Sampling Stratification Variable: variable or variables by which a study population is divided up into strata (or groups) in order to select a stratified sample. Proportionate Stratified Sample: Stratified sample where the number of households selected from each strata is proportional to the number of units in each strata in the population. Disproportionate Stratified Sample: Stratified sample where the number of households selected from each strata is not proportional to the number of units in each strata in the population. Almost all national household surveys use Disproportionate Stratified Sampling. This implies that raising factors, or sampling weights need to be used to obtain national estimates from the sample.

Excluded strata Parts of the country may need to be excluded from the sample for security or other reasons

Measuring change Pros and cons of panel samples A panel can measure change more accurately A panel permits correlating change in the outcomes with change in other factors A panel approach may reduce the effort of the second and subsequent rounds Panels are harder to manage and entail longterm commitments between data users and producers Panels are subject to attrition (respondent fatigue, migration, disappearance from the market, etc.) A panel is more vulnerable to manipulation from the predatory agencies

Assuring good field work Juan Muñoz

What happens when fieldwork is poor? A long and frustrating process of data cleaning becomes unavoidable The data loose their policy-making relevance Data quality is not guaranteed The process converges (at best) to databases that are internally consistent The process entails a myriad of decisions, generally undocumented Users mistrust the data

Key factors Manage the survey as an integrated project Implement the team concept in the organization of field operations Integrate computer-based quality controls to field operations Establish strong supervision procedures Ensure sufficient training Work with a reduced staff over an extended period of data collection

Management levels Core staff Survey manager Field operations manager Data manager Tactical options for the organization of field teams Mobile teams with fixed data entry Mobile teams with integrated data entry Sometime in the future: the paperless interview

Mobile teams with fixed data entry Cote d Ivoire (1984) Peru (1985) Ghana Pakistan Guinea-Conakry Mozambique

Composition of a field team Supervisor Interviewers Data entry operator

The team and its tools Supervisor Interviewers Antropometrist Data entry operator

Alama Two PSUs visited in a fourweek period Bamako Regional Office

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office They complete first half of questionnaires in all selected households Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama and back

First week Alama Bamako Regional Office Supervisor gives Alama questionnaires to DEO Rest of the team travels to Alama and back

Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako

Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako

Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako They complete first half of questionnaires in all selected households

Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako and back

Second week Alama Bamako Regional Office Rest of the team travels to Bamako and back Supervisor gives Bamako questionnaires to DEO. DEO gives back Alama questionnaires with flagged inconsistencies

Third week Alama Bamako Regional Office Team completes second half of questionnaires. They correct inconsistencies from first half Operator enters first week data from Bamako

Fourth week Alama Bamako Regional Office Operator enters second week data from Alama. Corrects inconsistencies from first round Team completes second half of questionnaires. They correct inconsistencies from first half

Fourth week The result is a clean data set on diskette, ready for analysis immediately after data collection Regional Office

Mobile teams with integrated data entry Nepal (1992) Argentina Paraguay Bangladesh (2000)

Mobile teams with integrated data entry Bamako Alama Team works with portable computers and printers Cocody Regional Office

Mobile teams with integrated data entry Bamako Alama Operator travels with the rest of the field team Cocody Regional Office

Mobile teams with integrated data entry Bamako Alama Cocody Data entry and validation almost immediate Regional Office

Mobile teams with integrated data entry Bamako Alama Cocody Reduced trips to and from Regional Office to selected PSUs Regional Office

Mobile teams with integrated data entry Bamako Alama Cocody Regional Office

Benefits of integration Provides reliable and timely databases Provides immediate feedback on the performance of the field staff, allowing early detection of inadequate behaviors Ensures that all field staff applies uniform criteria throughout the full period of data collection Solves inconsistencies through direct verification of households reality, rather that through office guesswork Is consistent with the total quality culture

Supervision tasks Verification of questionnaires for completeness Random re-interviews of households Observation of interviews

Selecting and training field staff Why is it important How long does it take How is it organized

Example: Day 2 of interviewer training Definition of household (and dwelling, family, etc.) Pictorial of a sample household Slide with an empty roster (explain case conventions, encoding, skip patterns, etc.) Fill the roster for the sample household (need for legible handwriting, recording of ages, use of a calendar of events, etc.) Role playing (trainer as a respondent, simulating borderline cases) Role playing (trainees interview each other)