Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Similar documents
Measuring Variability for Skewed Distributions

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Box Plots. So that I can: look at large amount of data in condensed form.

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

Algebra I Module 2 Lessons 1 19

Chapter 1 Midterm Review

Chapter 3. Averages and Variation

9.2 Data Distributions and Outliers

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

What can you tell about these films from this box plot? Could you work out the genre of these films?

Comparing Distributions of Univariate Data

Homework Packet Week #5 All problems with answers or work are examples.

What is Statistics? 13.1 What is Statistics? Statistics

Frequencies. Chapter 2. Descriptive statistics and charts

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Lesson 5: Events and Venn Diagrams

Distribution of Data and the Empirical Rule

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Normalization Methods for Two-Color Microarray Data

11, 6, 8, 7, 7, 6, 9, 11, 9

Dot Plots and Distributions

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

6 th Grade Semester 2 Review 1) It cost me $18 to make a lamp, but I m selling it for $45. What was the percent of increase in price?

Chapter 6. Normal Distributions

EXPLORING DISTRIBUTIONS

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Statistics for Engineers

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

Libraries as Repositories of Popular Culture: Is Popular Culture Still Forgotten?

Notes Unit 8: Dot Plots and Histograms

Visual Encoding Design

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

MATH& 146 Lesson 11. Section 1.6 Categorical Data

Lesson 5: Events and Venn Diagrams

Estimation of inter-rater reliability

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002

Sample Design and Weighting Procedures for the BiH STEP Employer Survey. David J. Megill Sampling Consultant, World Bank May 2017

The One Penny Whiteboard

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

The impact of sound technology on the distribution of shot lengths in motion pictures

Copyright 2013 Pearson Education, Inc.

Graphical Displays of Univariate Data

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Chapter 2 Describing Data: Frequency Tables, Frequency Distributions, and

STAT 250: Introduction to Biostatistics LAB 6

Collecting Data Name:

Technical Appendices to: Is Having More Channels Really Better? A Model of Competition Among Commercial Television Broadcasters

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

Centre for Economic Policy Research

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

d. Could you represent the profit for n copies in other different ways?

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

Lecture 10: Release the Kraken!

(Week 13) A05. Data Analysis Methods for CRM. Electronic Commerce Marketing

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Full file at

Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field

Managing Outage Details

GCSE MARKING SCHEME AUTUMN 2017 GCSE MATHEMATICS NUMERACY UNIT 1 - INTERMEDIATE TIER 3310U30-1. WJEC CBAC Ltd.

TI-Inspire manual 1. Real old version. This version works well but is not as convenient entering letter

UNCLASSIFIED. An lieptoduced. kf- the ARMED SERVICES TECHNICAL INFORMATION AGENCY ARLINGTON HALL STATION ARLINGTON 12. VIRGINIA UNCLASSIFIED

Chapters Page #s Due Date Comments

Chapter 7: RV's & Probability Distributions

OB35 Paper 06 KPI Report

Open access press vs traditional university presses on Amazon

KPI and SLA regime: November 2016 performance summary

How Large a Sample? CHAPTER 24. Issues in determining sample size

The Measurement Tools and What They Do

1. MORTALITY AT ADVANCED AGES IN SPAIN MARIA DELS ÀNGELS FELIPE CHECA 1 COL LEGI D ACTUARIS DE CATALUNYA

Record your answers and work on the separate answer sheet provided.

Simulation Supplement B

2013 Environmental Monitoring, Evaluation, and Protection (EMEP) Citation Analysis

CONCLUSION The annual increase for optical scanner cost may be due partly to inflation and partly to special demands by the State.

GfK Audience Measurements & Insights FREQUENTLY ASKED QUESTIONS TV AUDIENCE MEASUREMENT IN THE KINGDOM OF SAUDI ARABIA

KPI and SLA regime: June 2015 performance summary Ref Apr 15 May 15 Jun 15 Target Description KPI A 100% 100% 100% 99% green

Characterization and improvement of unpatterned wafer defect review on SEMs

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

THE USE OF RESAMPLING FOR ESTIMATING CONTROL CHART LIMITS

Sources of Error in Time Interval Measurements

On Your Own. Applications. Unit 2. ii. The following are the pairs of mutual friends: A-C, A-E, B-D, C-D, and D-E.

Pseudorandom Stimuli Following Stimulus Presentation

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

The Effects of Study Condition Preference on Memory and Free Recall LIANA, MARISSA, JESSI AND BROOKE

TeeJay Publishers. Curriculum for Excellence. Course Planner - Level 1

Set-Top-Box Pilot and Market Assessment

KPI and SLA regime: March 2017 performance summary

Good playing practice when drumming: Influence of tempo on timing and preparatory movements for healthy and dystonic players

AMERICAN FEDERATION OF MUSICIANS SUMMARY OF SCALES AND CONDITIONS TELEVISION VIDEOTAPE AGREEMENT

KPI and SLA regime: September 2015 performance summary

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

Introduction to Probability Exercises

Paired plot designs experience and recommendations for in field product evaluation at Syngenta

Chapter 40: MIDI Tool

Chapters Page #s Due Date Comments

Use black ink or black ball-point pen. Pencil should only be used for drawing. *

Chapter 2 Notes.notebook. June 21, : Random Samples

Margin of Error. p(1 p) n 0.2(0.8) 900. Since about 95% of the data will fall within almost two standard deviations, we will use the formula

Transcription:

: Measuring Variability for Skewed Distributions (Interquartile Range) Exploratory Challenge 1: Skewed Data and its Measure of Center Consider the following scenario. A television game show, Fact or Fiction, was canceled after nine shows. Many people watched the nine shows and were rather upset when it was taken off the air. A random sample of eighty viewers of the show was selected. Viewers in the sample responded to several questions. The dot plot below shows the distribution of ages of these eighty viewers. Dot Plot of Viewer Age Approximately where would you locate the mean (balance point) in the above distribution? How does the direction of the tail affect the location of the mean age compared to the median age? The mean age of the above sample is approximately 50. Do you think this age describes the typical viewer of this show? Explain your answer. Exploratory Challenge 2: Constructing and Interpreting the Box Plot 1. Using the above dot plot, construct a box plot over the dot plot by completing the following steps: i. Locate the middle 40 observations, and draw a box around these values. ii. Calculate the median, and then draw a line in the box at the location of the median. iii. Draw a line that extends from the upper end of the box to the largest observation in the data set. iv. Draw a line that extends from the lower edge of the box to the minimum value in the data set. : Measuring Variability for Skewed Distributions (Interquartile Range) Date: 11/16/15 S.42

2. Recall that the 5 values used to construct the dot plot make up the 5- number summary. What is the 5- number summary for this data set of ages? Minimum age: Lower quartile or Q1: Median Age: Upper quartile or Q3: Maximum age: Questions: What percent of the data does the box part of the box plot capture? What percent of the data falls between the minimum value and Q1? What percent of the data falls between Q3 and the maximum value? Exercises An advertising agency researched the ages of viewers most interested in various types of television ads. Consider the following summaries: Ages Target Products or Services 30 45 Electronics, home goods, cars 46 55 Financial services, appliances, furniture 56 72 Retirement planning, cruises, health care services The mean age of the people surveyed is approximately 50 years old. As a result, the producers of the show decided to obtain advertisers for a typical viewer of 50 years old. According to the table, what products or services do you think the producers will target? Based on the sample, what percent of the people surveyed about the Fact or Fiction show would have been interested in these commercials if the advertising table is accurate? The show failed to generate the interest the advertisers hoped. As a result, they stopped advertising on the show, and the show was cancelled. Kristin made the argument that a better age to describe the typical viewer is the median age. What is the median age of the sample? What products or services does the advertising table suggest for viewers if the median age is considered as a description of the typical viewer? : Measuring Variability for Skewed Distributions (Interquartile Range) Date: 11/16/15 S.43

What percent of the people surveyed would be interested in the products or services suggested by the advertising table if the median age were used to describe a typical viewer? What percent of the viewers have ages between Q1 and Q3? The difference between Q3 and Q1, or Q3 Q1, is called the interquartile range, or IQR. What is the interquartile range (IQR) for this data distribution? The IQR provides a summary of the variability for a skewed data distribution. The IQR is a number that specifies the length of the interval that contains the middle half of the ages of viewers. Do you think producers of the show would prefer a show that has a small or large interquartile range? Explain your answer. Do you agree with Kristin s argument that the median age provides a better description of a typical viewer? Explain your answer. Exploratory Challenge 3: Outliers Students at Waldo High School are involved in a special project that involves communicating with people in Kenya. Consider a box plot of the ages of 200 randomly selected people from Kenya. Boxplot of Ages for Kenya A data distribution may contain extreme data (specific data values that are unusually large or unusually small relative to the median and the interquartile range). A box plot can be used to display extreme data values that are identified as outliers. Each * in the box plot represents the ages of four people from this sample. Based on the sample, these four ages were considered outliers. : Measuring Variability for Skewed Distributions (Interquartile Range) Date: 11/16/15 S.44

Estimate the values of the four ages represented by an *. IQR: An outlier is defined to be any data value that is more than 1.5 (!"#) away from the nearest quartile. What is the median? What are the approximate values of Q1 and Q3? What is the approximate IQR of this sample? Mulfply the IQR by 1.5. Add that number to Q3 or subtract from Q1 to find any outliers. Median: Q1: IQR = Q3 Q1: IQR x 1.5: Q3 + IQR Q3: Q1 IQR What do you notice about the four ages identified by an *? Are there any age values that are less than!1 1.5 (!"#)? If so, these ages would also be considered outliers. Explain why there is no * on the low side of the box plot for ages of the people in the sample from Kenya. Lesson Summary Non- symmetrical data distributions are referred to as skewed. Left- skewed or skewed to the left means the data spreads out longer (like a tail) on the left side. Right- skewed or skewed to the right means the data spreads out longer (like a tail) on the right side. The center of a skewed data distribution is described by the median. Variability of a skewed data distribution is described by the interquartile range (IQR). The IQR describes variability by specifying the length of the interval that contains the middle : Measuring Variability for Skewed Distributions (Interquartile Range) Date: 11/16/15 S.45

Problem Set Consider the following scenario. Transportation officials collect data on flight delays (the number of minutes a flight takes off after its scheduled time). Consider the dot plot of the delay times in minutes for 60 BigAir flights during December 2012: Dot Plot of December Delay Times 1. How many flights left more than 60 minutes late? 2. Why is this data distribution considered skewed? 3. Is the tail of this data distribution to the right or to the left? How would you describe several of the delay times in the tail? 4. Draw a box plot over the dot plot of the flights for December (the mean of the 60 flight delays is approx. 42 minutes). 5. What is the interquartile range, or IQR, of this data set? 6. The mean of the 60 flight delays is approximately 42 minutes. A) Do you think that 42 minutes is typical of the number of minutes a BigAir flight was delayed? B) Why or why not? 7. Based on the December data, write a brief description of the BigAir flight distribution for December. 8. A) Calculate the percentage of flights with delays of more than 1 hour. B) Were there many flight delays of more than 1 hour? 9. BigAir later indicated that there was a flight delay that was not included in the data. The flight not reported was delayed for 48 hours. If you had included that flight delay in the box plot, how would you have represented it? Explain your answer. : Measuring Variability for Skewed Distributions (Interquartile Range) Date: 11/16/15 S.46

10. Consider a dot plot and the box plot of the delay times in minutes for 60 BigAir flights during January 2013. A) How is the January flight delay distribution different from the one summarizing the December flight delays? B) In terms of flight delays in January, did BigAir improve, stay the same, or do worse compared to December? Explain your answer. : Measuring Variability for Skewed Distributions (Interquartile Range) Date: 11/16/15 S.47