Measuring Variability for Skewed Distributions

Similar documents
Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Box Plots. So that I can: look at large amount of data in condensed form.

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Algebra I Module 2 Lessons 1 19

Chapter 1 Midterm Review

Chapter 3. Averages and Variation

What can you tell about these films from this box plot? Could you work out the genre of these films?

What is Statistics? 13.1 What is Statistics? Statistics

9.2 Data Distributions and Outliers

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Homework Packet Week #5 All problems with answers or work are examples.

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

Comparing Distributions of Univariate Data

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Distribution of Data and the Empirical Rule

11, 6, 8, 7, 7, 6, 9, 11, 9

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Frequencies. Chapter 2. Descriptive statistics and charts

Normalization Methods for Two-Color Microarray Data

Dot Plots and Distributions

Collecting Data Name:

Estimation of inter-rater reliability

6 th Grade Semester 2 Review 1) It cost me $18 to make a lamp, but I m selling it for $45. What was the percent of increase in price?

Notes Unit 8: Dot Plots and Histograms

EXPLORING DISTRIBUTIONS

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Libraries as Repositories of Popular Culture: Is Popular Culture Still Forgotten?

Statistics for Engineers

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

Lesson 5: Events and Venn Diagrams

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Chapter 6. Normal Distributions

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

Sources of Error in Time Interval Measurements

Simulation Supplement B

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

The One Penny Whiteboard

Lecture 10: Release the Kraken!

Sample Design and Weighting Procedures for the BiH STEP Employer Survey. David J. Megill Sampling Consultant, World Bank May 2017

The Effects of Study Condition Preference on Memory and Free Recall LIANA, MARISSA, JESSI AND BROOKE

Centre for Economic Policy Research

Chapter 2 Describing Data: Frequency Tables, Frequency Distributions, and

South African Cultural Observatory National Conference Presentation May 2016

COMP Test on Psychology 320 Check on Mastery of Prerequisites

MATH& 146 Lesson 11. Section 1.6 Categorical Data

Level 1 Mathematics and Statistics, 2011

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Empirical Model For ESS Klystron Cathode Voltage

The Measurement Tools and What They Do

Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field

The impact of sound technology on the distribution of shot lengths in motion pictures

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

STAT 250: Introduction to Biostatistics LAB 6

Sample Analysis Design. Element2 - Basic Software Concepts (cont d)

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002

Technical Appendices to: Is Having More Channels Really Better? A Model of Competition Among Commercial Television Broadcasters

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

Visual Encoding Design

Open access press vs traditional university presses on Amazon

1. MORTALITY AT ADVANCED AGES IN SPAIN MARIA DELS ÀNGELS FELIPE CHECA 1 COL LEGI D ACTUARIS DE CATALUNYA

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

2013 Environmental Monitoring, Evaluation, and Protection (EMEP) Citation Analysis

CONCLUSION The annual increase for optical scanner cost may be due partly to inflation and partly to special demands by the State.

FIM INTERNATIONAL SURVEY ON ORCHESTRAS

GfK Audience Measurements & Insights FREQUENTLY ASKED QUESTIONS TV AUDIENCE MEASUREMENT IN THE KINGDOM OF SAUDI ARABIA

Version : 1.0: klm. General Certificate of Secondary Education November Higher Unit 1. Final. Mark Scheme

Processes for the Intersection

BNCE TV05: 2008 testing of TV luminance and ambient lighting control

Full file at

Copyright 2013 Pearson Education, Inc.

Managing Outage Details

Characterization and improvement of unpatterned wafer defect review on SEMs

TI-Inspire manual 1. Real old version. This version works well but is not as convenient entering letter

Graphical Displays of Univariate Data

*On-Line appendix for non-tables, by Margo Schlanger

UNCLASSIFIED. An lieptoduced. kf- the ARMED SERVICES TECHNICAL INFORMATION AGENCY ARLINGTON HALL STATION ARLINGTON 12. VIRGINIA UNCLASSIFIED

AMERICAN FEDERATION OF MUSICIANS SUMMARY OF SCALES AND CONDITIONS TELEVISION VIDEOTAPE AGREEMENT

OB35 Paper 06 KPI Report

KPI and SLA regime: November 2016 performance summary

Set-Top-Box Pilot and Market Assessment

Chapter 7: RV's & Probability Distributions

THE USE OF RESAMPLING FOR ESTIMATING CONTROL CHART LIMITS

Record your answers and work on the separate answer sheet provided.

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

How Large a Sample? CHAPTER 24. Issues in determining sample size

NANOS. Trudeau sets yet another new high on the preferred PM tracking by Nanos

Paired plot designs experience and recommendations for in field product evaluation at Syngenta

Trudeau remains strong on preferred PM measure tracked by Nanos

Trudeau top choice as PM, unsure second and at a 12 month high

MEMORANDUM. TV penetration and usage in the Massachusetts market

Almost seven in ten Canadians continue to think Trudeau has the qualities of a good political leader in Nanos tracking

Trudeau scores strongest on having the qualities of a good political leader

Positive trajectory for Trudeau continues hits a twelve month high on preferred PM and qualities of good political leader in Nanos tracking

NANOS. Trudeau first choice as PM, unsure scores second and at a three year high

SQTR-2M ADS-B Squitter Generator

Transcription:

Measuring Variability for Skewed Distributions Skewed Data and its Measure of Center Consider the following scenario. A television game show, Fact or Fiction, was canceled after nine shows. Many people watched the nine shows and were rather upset when it was taken off the air. A random sample of eighty viewers of the show was selected. Viewers in the sample responded to several questions. The dot plot below shows the distribution of ages of these eighty viewers: 1. Approximately where would you locate the mean (balance point) in the above distribution? 2. How does the direction of the tail affect the location of the mean age compared to the median age? 3. The mean age of the above sample is approximately 50. Do you think this age describes the typical viewer of this show? Explain your answer. Constructing and Interpreting the Box Plot Using the above dot plot, construct a box plot over the dot plot by completing the following steps: i. Locate the middle 40 observations and draw a box around these values. ii. Calculate the median and then draw a line in the box at the location of the median. iii. Draw a line that extends from the upper end of the box to the largest observation in the data set. iv. Draw a line that extends from the lower edge of the box to the minimum value in the data set.

4. Recall that the 5 values used to construct the dot plot make up the 5 number summary. What is the 5 number summary for this data set of ages? Minimum age: Lower quartile or Q1: Median Age: Upper quartile or Q3: Maximum age: 5. What percent of the data does the box part of the box plot capture? 6. What percent of the data falls between the minimum value and Q1? 7. What percent of the data falls between Q3 and the maximum value? An advertising agency researched the ages of viewers most interested in various types of television ads. Consider the following summaries: Ages Target Products or Services 30 45 Electronics, home goods, cars 46 55 Financial services, appliances, furniture 56 72 Retirement planning, cruises, health care services 8. The mean age of the people surveyed is approximately 50 years old. As a result, the producers of the show decided to obtain advertisers for a typical viewer of 50 years old. According to the table, what products or services do you think the producers will target? Based on the sample, what percent of the people surveyed would have been interested in these commercials if the advertising table were accurate? 9. The show failed to generate interest the advertisers hoped. As a result, they stopped advertising on the show and the show was cancelled. Kristin made the argument that a better age to describe the typical viewer is the median age. What is the median age of the sample? What products or services does the advertising table suggest for viewers if the median age is considered as a description of the typical viewer? 10. What percentage of the people surveyed would be interested in the products or services suggested by the advertising table if the median age were used to describe a typical viewer?

11. What percent of the viewers have ages between Q1 and Q3? The difference between Q3 and Q1, or Q3 Q1, is called the interquartile range or IQR. What is the interquartile range (IQR) for this data distribution? 12. The IQR provides a summary of the variability for a skewed data distribution. The IQR is a number that specifies the length of the interval that contains the middle half of the ages of viewers. Do you think producers of the show would prefer a show that has a small or large interquartile range? Explain your answer. 13. Do you agree with Kristin s argument that the median age provides a better description of a typical viewer? Explain your answer. Outliers Students at Waldo High School are involved in a special project that involves communicating with people in Kenya. Consider a box plot of the ages of 200 randomly selected people from Kenya: A data distribution may contain extreme data (specific data values that are unusually large or unusually small relative to the median and the interquartile range). A box plot can be used to display extreme data values that are identified as outliers. An outlier is defined to be any data value that is more than 1.5 (IQR) away from the nearest quartile. The * in the box plot are the ages of four people from this sample. Based on the sample, these four ages were considered outliers.

14. Estimate the values of the 4 ages represented by an *. 15. What is the median age of the sample of ages from Kenya? What are the approximate values of Q1 and Q3? What is the approximate IQR of this sample? 16. Multiply the IQR by 1.5. What value do you get? 17. Add to the 3rd quartile age (Q3). What do you notice about the four ages identified by an *? 18. Are there any age values that are less than? If so, these ages would also be considered outliers. 19. Explain why there is no * on the low side of the box plot for ages of the people in the sample from Kenya. Consider if there are any age values that are less than Q1 1.5 x IQR. The midrange of a data set is defined to be the average of the minimum and maximum values: (min + max)/2. The midhinge of a data set is defined to be the average of the first quartile (Q 1 ) and the third quartile (Q 3 ): (Q 1 +Q 3 )/2. a. Is the midrange a measure of center or a measure of spread? Explain. b. Is the midhinge a measure of center or a measure of spread? Explain.

Problem Set Consider the following scenario. Transportation officials collect data on flight delays (the number of minutes a flight takes off after its scheduled time). Consider the dot plot of the delay times in minutes for 60 BigAir flights during December 2012: 1. How many flights left 60 or more minutes late? 2. Why is this data distribution considered skewed? Is the tail of this data distribution to the right or to the left? 3. Draw a box plot over the dot plot of the flights for December. 4. What is the interquartile range or IQR of this data set? 5. The mean of the 60 flight delays is approximately 42 minutes. Do you think that 42 minutes is typical of the number of minutes a BigAir flight was delayed? Why or why not? 6. Based on the December data, write a brief description of the BigAir flight distribution for December. 7. Calculate the percentage of flights with delays of more than 1 hour. Were there many flight delays of more than 1 hour? 8. BigAir later indicated that there was a flight delay that was not included in the data. The flight not reported was delayed for 48 hours. If you had included that flight delay in the box plot, how would you have represented it? Explain your answer. 9. Consider a dot plot and the box plot of the delay times in minutes for 60 BigAir flights during January 2013. How is the January flight delay distribution different from the one summarizing the December flight delays? In terms of flight delays in January, did BigAir improve, stay the same, or do worse compared to December? Explain your answer.