STAT 250: Introduction to Biostatistics LAB 6

Similar documents
Chapter 1 Midterm Review

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

DV: Liking Cartoon Comedy

Algebra I Module 2 Lessons 1 19

More About Regression

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

What is Statistics? 13.1 What is Statistics? Statistics

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Margin of Error. p(1 p) n 0.2(0.8) 900. Since about 95% of the data will fall within almost two standard deviations, we will use the formula

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

9.2 Data Distributions and Outliers

Resampling Statistics. Conventional Statistics. Resampling Statistics

Measuring Variability for Skewed Distributions

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Homework Packet Week #5 All problems with answers or work are examples.

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Western Statistics Teachers Conference 2000

EXPLORING DISTRIBUTIONS

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

Lecture 10: Release the Kraken!

in the Howard County Public School System and Rocketship Education

Supplementary Figures Supplementary Figure 1 Comparison of among-replicate variance in invasion dynamics

Racial / Ethnic and Gender Diversity in the Orchestra Field

The Time Series Forecasting System Charles Hallahan, Economic Research Service/USDA, Washington, DC

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

How Large a Sample? CHAPTER 24. Issues in determining sample size

TI-Inspire manual 1. Real old version. This version works well but is not as convenient entering letter

Comparison of Mixed-Effects Model, Pattern-Mixture Model, and Selection Model in Estimating Treatment Effect Using PRO Data in Clinical Trials

Chapter 3. Averages and Variation

Producing Data: Sampling

Dot Plots and Distributions

LAB 1: Plotting a GM Plateau and Introduction to Statistical Distribution. A. Plotting a GM Plateau. This lab will have two sections, A and B.

THE USE OF RESAMPLING FOR ESTIMATING CONTROL CHART LIMITS

Abstract. Keywords Movie theaters, home viewing technology, audiences, uses and gratifications, planned behavior, theatrical distribution

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

Analysis of data from the pilot exercise to develop bibliometric indicators for the REF

Chapter 7 Probability

Box Plots. So that I can: look at large amount of data in condensed form.

MATH& 146 Lesson 11. Section 1.6 Categorical Data

User Guide. S-Curve Tool

SOUND LABORATORY LING123: SOUND AND COMMUNICATION

E X P E R I M E N T 1

Time Domain Simulations

Distribution of Data and the Empirical Rule

GBA 327: Module 7D AVP Transcript Title: The Monte Carlo Simulation Using Risk Solver. Title Slide

ROY G BIV COLOR VISION EXPLORED

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

Tutorial 0: Uncertainty in Power and Sample Size Estimation. Acknowledgements:

Problem Points Score USE YOUR TIME WISELY USE CLOSEST DF AVAILABLE IN TABLE SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Chapter 21. Margin of Error. Intervals. Asymmetric Boxes Interpretation Examples. Chapter 21. Margin of Error

I. Model. Q29a. I love the options at my fingertips today, watching videos on my phone, texting, and streaming films. Main Effect X1: Gender

Placement Rent Exponent Calculation Methods, Temporal Behaviour, and FPGA Architecture Evaluation. Joachim Pistorius and Mike Hutton

GETTING STARTED... 2 ENVIRONMENT SCAN... 2

Comparing Distributions of Univariate Data

Tapping to Uneven Beats

Chapter 7: RV's & Probability Distributions

Sampler Overview. Statistical Demonstration Software Copyright 2007 by Clifford H. Wagner

Introductory Statistics. Lecture 1 Sinan Hanay

Quantitative methods

Monday 15 May 2017 Afternoon Time allowed: 1 hour 30 minutes

Normalization Methods for Two-Color Microarray Data

3. Population and Demography

Does the number of users rating the movie accurately predict the average user rating?

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

BIBLIOGRAPHIC DATA: A DIFFERENT ANALYSIS PERSPECTIVE. Francesca De Battisti *, Silvia Salini

Diversity Report 2017

QuickTIPS REMOTE CONTROL TRULINK FOR ANDROID DEVICES VOLUME CHANGES MEMORY CHANGES. PRODUCT AVAILABILITY: Halo iq, Halo 2, and Halo Devices

Using the Agilent for Single Crystal Work

Supplemental Material: Color Compatibility From Large Datasets

Dektak Step by Step Instructions:

Why visualize data? Advanced GDA and Software: Multivariate approaches, Interactive Graphics, Mondrian, iplots and R. German Bundestagswahl 2005

Practice Test. 2. What is the probability of rolling an even number on a number cube? a. 1 6 b. 2 6 c. 1 2 d. 5 be written as a decimal? 3.

By: Claudia Romo, Heidy Martinez, Ara Velazquez

B2 Spice A/D Tutorial Author: B. Mealy revised: July 27, 2006

Characterization and improvement of unpatterned wafer defect review on SEMs

Centre for Economic Policy Research

Prevalence of Color Vision defects (CVD) Among Adult Human Population of District Gilgit, Gilgit-Baltistan, Pakistan.

MANOVA/MANCOVA Paul and Kaila

PHY221 Lab 1 Discovering Motion: Introduction to Logger Pro and the Motion Detector; Motion with Constant Velocity

Chapter 6. Normal Distributions

GET STARTED WITH ACADEMIC WRITER

MIS 0855 Data Science (Section 005) Fall 2016 In-Class Exercise (Week 6) Advanced Data Visualization with Tableau

Frequencies. Chapter 2. Descriptive statistics and charts

Relationships Between Quantitative Variables

Getting Graphical PART II. Chapter 5. Chapter 6. Chapter 7. Chapter 8. Chapter 9. Beginning Graphics Page Flipping and Pixel Plotting...

NHIH English Language Cable Audience Composition

What s New in choiceadvantage

Subject: Florida Statewide Republican Governor Primary Election survey conducted for FloridaPolitics.com

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

Subject: Florida U.S. Congressional District 13 Primary Election survey

A Dominant Gene Genetic Algorithm for a Substitution Cipher in Cryptography

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Transcription:

STAT 250: Introduction to Biostatistics LAB 6 Dr. Kari Lock Morgan Sampling Distributions In this lab, we ll explore sampling distributions using StatKey: www.lock5stat.com/statkey. We ll be using StatKey, rather than Minitab, for the next unit because it makes it very easy to simulate thousands of samples instantly. 1) Sampling Distribution for a Proportion: Red-Green Color Blindness Red-Green Color Blindness is when a person has difficulty distinguishing between red and green. It a sex-linked trait, and is much more common in males than in females, because the gene associated with it is only on the X chromosome. Females have two X chromosomes, and are only color blind if BOTH copies are defective, while males only have one X chromosome and so are red-green color blind whenever their one X chromosome is defective. Approximately 8% of all the associated alleles in Caucasians are defective, so approximately 8% of Caucasian men are red-green color blind. (By the Hardy-Weinberg Principle, about 15% of Caucasian females are carriers, and only about 0.6% are red-green color blind). We ll explore what kinds of sample proportions of red-green color blindess we might find for random samples of Caucasian males, when the true parameter p = 0.08. a) Go to StatKey (www.lock5stat.com/statkey ) and select the Sampling Distribution for a Proportion.

b) Click on Edit Proportion to change the population proportion to 0.08, and click the box next to Choose samples of size n = and change the sample size to 100. This should update the original population proportion to show 0.08 on the right. c) Click on Generate 1 Sample to draw one sample of size 100 from the population. You will see the results displayed under Sample, where it gives a frequency total and the sample proportion, p. You will also see this sample proportion plotted with a dot on the big sampling distribution dotplot on the left. What proportion in your random sample of men happened to be red-green colorblind? Compare your answer with your neighbors you likely got different results. d) Repeat step c) a couple more times, generating one sample at a time, looking at the frequency table and sample proportion on the right, and then finding the

corresponding dot on the sampling distribution. The point here is to become comfortable with the fact that each dot represents a summary statistic from an entire sample of size 100. How much do statistics tend to be varying from sample to sample? e) Once you fully understand what is going on, go ahead and click Generate 1000 Samples to automatically simulate repeat the process 1000 times. This is simulating 1000 random samples, each of size 100, and plotting the sample proportion for each on the sampling distribution dotplot. What s the farthest value you got from the truth? f) The summary statistics for the sampling distribution are shown in the top right of the dotplot. This gives the number of samples simulated (samples), the mean of the sampling distribution, and the standard deviation of the sampling distribution. a. What is the mean of the distribution? Why? b. What is the standard error of the sample proportion? g) Based on the sampling distribution, what do you think is a good distance such that it is rare for statistics to be farther than that from the parameter? (This will help you to know how far the parameter might be from a statistic, which is a more common problem). h) Generate a few thousand more simulated samples. How does this change the mean? How does this change the standard error? (any changes should be only very minor!) i) Change the sample size from 100 to 1000. BEFORE generating new samples, predict what will change about the distribution. Start generating a few samples at a time, observe how much they are varying, then generate 1000 samples. a. What is the farthest from the truth? b. What is the mean?

c. What is the standard error? d. How do these numbers compare to the sampling distribution when n was 100? Discuss the effect of sample size on the sampling distribution with your neighbors. j) So far we ve been looking only at Caucasians, but the prevalence in 5% in men of Asian origin and 4% in men of African origin. Edit the population proportion to reflect this new population, and recreate the sampling distribution (you can leave the sample size at 1000). What is the mean now? Why? 2) Sampling Distribution for a Mean: Baseball Player Salaries a) Go back to the StatKey homepage by clicking on the StatKey icon at the top, and switch to sampling distribution for a mean. b) Click on the drop down menu to choose Baseball players salaries, which is a dataset giving the salaries (in millions) for all major league baseball players in 2012, so we have data on the entire population. The data is displayed in the top right, and you can see the corresponding relevant population parameters. a. How many major league baseball players were there in 2012? b. What is the population mean? c. Describe the shape of the distribution. c) Set the sample size to be 100 (or something else if you feel strongly!). d) Same as before click Generate 1 Sample to draw a random sample of 100 baseball players from the population. Like before, the sample data will be displayed under Sample and the sample statistic (now the mean) will be plotted with a dot on the sampling distribution. a. Click on Show Data Table to see who you randomly selected. b. What is the sample mean of your sample? c. How do the summary statistics (mean, median, standard deviation) of the random sample compare to the parameters in the population?

e) Repeat step d several times. Try to get a feel for the following: a. How much do sample means tend to be varying from sample to sample? b. How do the sample statistics (random) relate to the population parameters (fixed)? f) Generate 1000 samples to get a sampling distribution. a. What is the mean of the sampling distribution? Why? b. What is the standard error of the sample mean? c. Describe the shape of the distribution. d. What is the farthest value from the truth? e. Try hovering your mouse over any of the dots to see the sample it came from. Remember each dot is a statistic from a sample! g) How does the standard deviation of the statistic (the standard error) compare to the standard deviation of the actual data? It s important to keep these two separate in your minds: in the right panels each dot is a salary of a single baseball player, while in the sampling distribution each dot is the mean salary for 100 baseball players. Although both are measuring a standard deviation, they are measuring standard deviation of two very different distributions. h) Select one last random sample, and suppose you only new the sample mean from those 100 players, not the population mean. That s your best guess, but how much uncertainty lies in this estimate? How far might the parameter be from the statistic? In this situation we want to create an interval estimating the true population mean that is centered around your sample statistic (your best estimate). How wide should this interval be? Use information from the sampling distribution to help you decide how wide to make your interval. (Focus on the spread of the distribution, not the

center). (Note: I haven t taught you formally how to do this yet, that s the topic of next class, at this point I just want you to think and be creative!)