AGAINST ALL ODDS EPISODE 22 SAMPLING DISTRIBUTIONS TRANSCRIPT

Similar documents
Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

What is Statistics? 13.1 What is Statistics? Statistics

Chapter 6. Normal Distributions

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

Algebra I Module 2 Lessons 1 19

More About Regression

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Lecture 10: Release the Kraken!

Distribution of Data and the Empirical Rule

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Box Plots. So that I can: look at large amount of data in condensed form.

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

Chapter 21. Margin of Error. Intervals. Asymmetric Boxes Interpretation Examples. Chapter 21. Margin of Error

Chapter 3. Averages and Variation

Margin of Error. p(1 p) n 0.2(0.8) 900. Since about 95% of the data will fall within almost two standard deviations, we will use the formula

Measuring Variability for Skewed Distributions

Characterization and improvement of unpatterned wafer defect review on SEMs

THE MONTY HALL PROBLEM

Chapter 1 Midterm Review

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

6 th Grade Semester 2 Review 1) It cost me $18 to make a lamp, but I m selling it for $45. What was the percent of increase in price?

SDS PODCAST EPISODE 96 FIVE MINUTE FRIDAY: THE BAYES THEOREM

NETFLIX MOVIE RATING ANALYSIS

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Quantitative methods

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

Frequencies. Chapter 2. Descriptive statistics and charts

GBA 327: Module 7D AVP Transcript Title: The Monte Carlo Simulation Using Risk Solver. Title Slide

DIFFERENTIATE SOMETHING AT THE VERY BEGINNING THE COURSE I'LL ADD YOU QUESTIONS USING THEM. BUT PARTICULAR QUESTIONS AS YOU'LL SEE

The One Penny Whiteboard

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Objective: Write on the goal/objective sheet and give a before class rating. Determine the types of graphs appropriate for specific data.

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

Relationships Between Quantitative Variables

Precision testing methods of Event Timer A032-ET

Histograms and Frequency Polygons are statistical graphs used to illustrate frequency distributions.

Confidence Intervals for Radio Ratings Estimators

Full file at

Getting Stuck in the Negatives (and How to Get Unstuck) by Alison Ledgerwood (Transcript)

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

RF Safety Surveys At Broadcast Sites: A Basic Guide

MATH& 146 Lesson 11. Section 1.6 Categorical Data

When do two squares make a new square

Evaluating Oscilloscope Mask Testing for Six Sigma Quality Standards

Dot Plots and Distributions

Human Hair Studies: II Scale Counts

Sampler Overview. Statistical Demonstration Software Copyright 2007 by Clifford H. Wagner

SCANNER TUNING TUTORIAL Author: Adam Burns

1996 Yampi Shelf, Browse Basin Airborne Laser Fluorosensor Survey Interpretation Report [WGC Browse Survey Number ]

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

Reviews of earlier editions

Table of Contents. Introduction...v. About the CD-ROM...vi. Standards Correlations... vii. Ratios and Proportional Relationships...

Homework Packet Week #5 All problems with answers or work are examples.

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

Western Statistics Teachers Conference 2000

The use of an available Color Sensor for Burn-In of LED Products

EXPLORING DISTRIBUTIONS

THE USE OF RESAMPLING FOR ESTIMATING CONTROL CHART LIMITS

Dither Explained. An explanation and proof of the benefit of dither. for the audio engineer. By Nika Aldrich. April 25, 2002

Avoiding False Pass or False Fail

Statistics: A Gentle Introduction (3 rd ed.): Test Bank. 1. Perhaps the oldest presentation in history of descriptive statistics was

In total 2 project plans are submitted. Deadline for Plan 1 is on at 23:59. The plan must contain the following information:

Tech Paper. HMI Display Readability During Sinusoidal Vibration

Analog Circuits Prof. Nagendra Krishnapura Department of Electrical Engineering Indian Institute of Technology, Madras. Module - 04 Lecture 12

Answers. Chapter 9 A Puzzle Time MUSSELS. 9.1 Practice A. Technology Connection. 9.1 Start Thinking! 9.1 Warm Up. 9.1 Start Thinking!

Olga Feher, PhD Dissertation: Chapter 4 (May 2009) Chapter 4. Cumulative cultural evolution in an isolated colony

Comparing Distributions of Univariate Data

MITOCW big_picture_integrals_512kb-mp4

PYROPTIX TM IMAGE PROCESSING SOFTWARE

Probability Random Processes And Statistical Analysis

Copyright 2013 Pearson Education, Inc.

DISPLAY WEEK 2015 REVIEW AND METROLOGY ISSUE

Common Manufacturing Platforms and Testing

Notes Unit 8: Dot Plots and Histograms

Chapter 7: RV's & Probability Distributions

Chapter 2 Describing Data: Frequency Tables, Frequency Distributions, and

EVAPORATIVE COOLER. ...Simple Effective Inexpensive to operate Economical. MODEL EC2.5 EC to CFM Nominal Airflow

Draft last edited May 13, 2013 by Belinda Robertson

Chapter 14 D-A and A-D Conversion

In this Issue: AMS News Letter. John Deere s new Display Shearer Equipment s 2017 Test Plot Information on the 17-1 Software update

Measurement User Guide

WELDING CONTROL UNIT: TE 450 USER MANUAL

The RTE-3 Extended Task. Hoa Dang Ellen Voorhees

Understanding Fidelity

STAT 250: Introduction to Biostatistics LAB 6

5. The JPS Solo Piano Arranging System

Evolutionary Computation Applied to Melody Generation

Familiar Metric Management - The Effort-Time Tradeoff: It s in the Data

Type-2 Fuzzy Logic Sensor Fusion for Fire Detection Robots

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

Grade 2 3 rd Quarter Pacing Guide Unit 3: Bigger Books Mean Amping Up Reading Power

Centre for Economic Policy Research

Transcription:

AGAINST ALL ODDS EPISODE 22 SAMPLING DISTRIBUTIONS TRANSCRIPT 1

FUNDER CREDITS Funding for this program is provided by Annenberg Learner. 2

INTRO Pardis Sabeti Hi, I m Pardis Sabeti and this is Against All Odds, where we make statistics count. Meet the third grade class at Monica Ros School in Ojai, California twenty rambunctious 9-year-olds who are about to become a living demonstration of how, with the right statistical tools, we can use samples from a population to make inferences about the population as a whole. Remember, we get statistics from sample data, while parameters are generally unknown because they describe an entire population. Here s our population, the nine-year olds at Monica Ros, lining up according to their height 50-inchers at one end, 57-inchers at the other, with most of them clustered around the center near 53 or 54 inches. This is our population distribution the distribution of values in our whole population of nine-year-olds in the class. When we plot this out, you won t be surprised by now to see that the shape of the distribution is approximately a Normal curve, from which we can calculate the exact mean, mu 53.4 inches and the standard deviation, sigma 1.8 inches. Okay, now we are going to take some random samples from our population. Here comes the first sample of 4 students. Throughout this example we re going to stick with 4 as our sample size. Once again we calculate the mean in this case the sample mean, or x-bar which comes out at 53 inches. Now we take a different random sample of 4. This time the sample mean is 52.25 inches. From a third random sample we get an xbar of 52.75 inches. From a fourth, a different distribution but an x-bar that happens to be the same as the last. And from a fifth random sample we get a sample mean of 53.25 inches. We can keep going until we ve selected all possible samples of 4 from our population of 20. And now we can plot out all the sample means we calculated. What we get is a distribution of our sample means or to put it another way, a sampling distribution of x-bar : a distribution of x-bar values from all possible samples of size 4. And like any distribution it can be described by its shape, center and spread. The shape is our familiar Normal curve. The center is the mean of xbar, 53.4 inches which is exactly what we got for the mean when we calculated it from the population as a whole. 3

But when we directly compare the population distribution to our sampling distribution we can see that while the center is the same, the spread of the sampling distribution is much less. In fact, sample means are always less variable than individuals. That makes sense. A random sample should include a variety of individuals from our population, some short, some medium, some tall. The sample mean literally averages out that variety, so we see less variation. There is a precise relationship between the standard deviation of the sample mean, and the standard deviation of the individual heights. Here it is. The standard deviation of the sample mean is simply the population standard deviation, divided by the square root of the sample size in our case, 4. So the standard deviation of our sample mean is 0.9 inches. Let s see how we can put this fact to use in a situation with more at stake than figuring out the average height of a third grade class a manufacturing plant for circuit boards. Actually, this is a scene you don t see that much any more an electronics manufacturing plant in the United States. Today, many electronic assembly plants are overseas, and involve components that are a lot smaller than these. But we are interested in how statistics helps control quality in manufacturing, and those principles haven t changed. A key part of the manufacturing process is when the components on the board are connected together by passing it through a bath of molten solder. If things go wrong say the temperature of the solder or the speed of the conveyer isn t right then the connections on the boards will be faulty. Workers can t wait until the boards reach the end of the line to spot a problem that s occurring here, so after they ve passed through the soldering bath, an inspector randomly selects boards for a quality check. A score of 100 is the standard, with some a little higher, some a little lower the natural variation inherent in any manufacturing process. The goal of the quality control process is to spot if this variation starts drifting out of the acceptable range, suggesting a problem with the soldering bath. Here is the distribution of the quality scores, a Normal curve centered on 100. Its standard deviation, derived from the company s experience with the process, is 4. So how does what we ve learned about sampling distributions help quickly spot when things go awry? The inspector s random sampling of the boards has a sample size of 5. Let s see what happens when we take repeated samples. From our first five boards, we get an x-bar of 99.4. From our second sample of five different boards we get a different x-bar, 101.6. We keep on sampling, 4

calculating each x-bar, until we can start building a sampling distribution which is, as expected, centered on the mean of the population, 100. But it has a smaller standard deviation of around 1.79, as you can see from the formula. Here s how this is useful in the statistical quality process known as the x-bar control chart. The inspector plots the values of x-bar against time. As expected, there s variation. The goal of the process is to distinguish chance variation from the extra variation that shows something is going wrong. Here s the normal distribution of x-bar. If everything is going well, the mean will be at 100. Now, recall the 68-95-99.7 rule for any normal distribution. Almost all observations, 99.7 percent in fact, will lie within 3 standard deviations of the mean. Going out three standard deviations from the mean gives us what this quality control method calls control limits. As long as the quality scores remain normally distributed with mean 100 and standard deviation 4 that is, as long as the soldering process continues its past pattern almost all the x-bar points on the chart will fall between the control limits. The process is said to be in control, when its pattern of variation is stable over time. A point outside the limit is evidence that the process has become more variable or that its mean has shifted; in other words, that it s gone out of control. As soon as an inspector sees a point like this, it s a signal to ask, what s gone wrong? So far we ve been talking about population distributions that follow a roughly normal curve. What if we were to take a very different population distribution? The Mayor s 24 Hour Constituent Service Hotline in Boston answers hundreds of thousands of calls every year. The operators handle everything from simple requests to more complicated questions about city services. Justin Holmes Typically, people call to report things like potholes, or streetlights that might be out in their neighborhood. Tara Blumstein People put trash out too early, or people have trash out too late. Or people aren t keeping their property clean. Janine Coppola They call from some street that a major four-lane street where they say there s a turkey running down the middle of the street. Tara Blumstein There was one where a skunk got stuck in a revolving door. Jessica Obasohan 5

I had one recently about a little boy who wanted to request to speak to a Park Ranger with a horse. Pardis Sabeti The thing about calls to this and most other call centers is that the length of the calls varies widely. Justin Holmes So, the average length of our calls on the 24 Hour Hotline is about a minute and a half. But, obviously, that can range. Pardis Sabeti Most calls are relatively brief, but a few can just go on and on. The Mayor s Hotline answered a total of 21,669 calls in one month. If we plot the length of each call, the variable, against the number of calls, we d get a distribution sharply skewed to the right. You can see it on this density curve representing the month s call durations. If we sampled from this population, would our sampling distribution also be skewed to the right? You might expect so. So let s find out, using different sample sizes, starting with samples of size 10. Let s take our first sample of size 10. For this sample, the mean length of the calls is 98.7 seconds. We can start to build a histogram using the sample mean. Let s continue taking samples of size 10, each time finding the mean of the sample, and adding it to our histogram. This final histogram is based on forty different samples of size ten. We can do the same thing with 40 different samples of size 20. And finally we can do it again with 40 different samples with an n of 60. Now let s compare our sampling distributions with the original population distribution for all calls to the Mayor s hotline. The spread of all of the sampling distributions is smaller than the spread of the population distribution. And you can see that the spread of the sampling distributions tightens still further as the n increases. Interestingly, while the smallest sample size, with an n of 10, is still right skewed, the asymmetry decreases as the sample size gets bigger. By the time n equals 60, we ve lost the skew and the sampling distribution of the sample mean looks pretty much Normal. The Normal quantile plot of x-bar data from 40 samples of size 60 looks like a pretty straight line, underscoring that our data are Normal. This happens because with larger samples we are less likely to get all big or all small numbers. We usually get a mix. Some x-bars will be above mu, some 6

below. So we get an approximately Normal shape for the sampling distribution, even though the parent population isn t Normal. What we ve uncovered here is one of the most powerful tools statisticians possess, called the Central Limit Theorem. This states that, regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately Normal if the sample size is large enough. It s because of the Central Limit Theorem that statisticians can generalize from sample data to the larger population. We ll be seeing how that can be useful in later modules about confidence intervals and significance tests. Stay tuned! I m Pardis Sabeti for Against All Odds. 7

PRODUCTION CREDITS Host Dr. Pardis Sabeti Writer/Producer/Director Maggie Villiger Associate Producer Katharine Duffy Editor Seth Bender Director of Photography Dan Lyons Additional Camera Noah Brookoff Audio Dan Casey Sound Mix Richard Bock Animation Jason Tierney Title Animation Jeremy Angier Web + Interactive Developer Matt Denault / Azility, Inc. Website Designer Dana Busch Production Assistant Kristopher Cain Teleprompter Kelly Cronin Hair/Makeup - Amber Voner Additional Footage Coast Learning Systems Music DeWolfe Music Library Based on the original Annenberg/CPB series Against All Odds, Executive Producer Joe Blatt Annenberg Learner Program Officer Michele McLeod Project Manager Dr. Sol Garfunkel Chief Content Advisor Dr. Marsha Davis 8

Executive Producer Graham Chedd Copyright 2014 Annenberg Learner 9

FUNDER CREDITS Funding for this program is provided by Annenberg Learner. For information about this, and other Annenberg Learner programs, call 1-800- LEARNER, and visit us at www.learner.org. 10