Distribution of Data and the Empirical Rule

Similar documents
What is Statistics? 13.1 What is Statistics? Statistics

Algebra I Module 2 Lessons 1 19

Frequencies. Chapter 2. Descriptive statistics and charts

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Histograms and Frequency Polygons are statistical graphs used to illustrate frequency distributions.

Chapter 6. Normal Distributions

Box Plots. So that I can: look at large amount of data in condensed form.

Dot Plots and Distributions

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

Chapter 1 Midterm Review

9.2 Data Distributions and Outliers

Full file at

download instant at

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

6 th Grade Semester 2 Review 1) It cost me $18 to make a lamp, but I m selling it for $45. What was the percent of increase in price?

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Answers. Chapter 9 A Puzzle Time MUSSELS. 9.1 Practice A. Technology Connection. 9.1 Start Thinking! 9.1 Warm Up. 9.1 Start Thinking!

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

Chapter 2 Describing Data: Frequency Tables, Frequency Distributions, and

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002

Measuring Variability for Skewed Distributions

MATH& 146 Lesson 11. Section 1.6 Categorical Data

Chapter 3. Averages and Variation

Chapter 2 Notes.notebook. June 21, : Random Samples

Objective: Write on the goal/objective sheet and give a before class rating. Determine the types of graphs appropriate for specific data.

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

EOC FINAL REVIEW Name Due Date

Notes Unit 8: Dot Plots and Histograms

Mobile Math Teachers Circle The Return of the iclicker

Homework Packet Week #5 All problems with answers or work are examples.

AskDrCallahan Calculus 1 Teacher s Guide

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

EXPLORING DISTRIBUTIONS

Margin of Error. p(1 p) n 0.2(0.8) 900. Since about 95% of the data will fall within almost two standard deviations, we will use the formula

Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

Key Maths Facts to Memorise Question and Answer

When do two squares make a new square

Sampler Overview. Statistical Demonstration Software Copyright 2007 by Clifford H. Wagner

Section 2.1 How Do We Measure Speed?

N12/5/MATSD/SP2/ENG/TZ0/XX. mathematical STUDIES. Wednesday 7 November 2012 (morning) 1 hour 30 minutes. instructions to candidates

Grade 5 Mathematics Mid-Year Assessment REVIEW

Record your answers and work on the separate answer sheet provided.

Copyright 2013 Pearson Education, Inc.

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

MA 15910, Lesson 5, Algebra part of text, Sections 2.3, 2.4, and 7.5 Solving Applied Problems

Before the Federal Communications Commission Washington, D.C ) ) ) ) ) ) ) ) ) REPORT ON CABLE INDUSTRY PRICES

d. Could you represent the profit for n copies in other different ways?

Sandwich. Reuben BLT. Egg salad. Roast beef

Practice Test. 2. What is the probability of rolling an even number on a number cube? a. 1 6 b. 2 6 c. 1 2 d. 5 be written as a decimal? 3.

Graphical Displays of Univariate Data

Processes for the Intersection

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

NETFLIX MOVIE RATING ANALYSIS

The One Penny Whiteboard

T HE M AGIC OF G RAPHS AND S TATISTICS

Sampling Worksheet: Rolling Down the River

Collecting Data Name:

Display Dilemma. Display Dilemma. 1 of 12. Copyright 2008, Exemplars, Inc. All rights reserved.

Fill out the following table: Solid #1 Solid #2 Volume. Number of Peanuts. Ratio

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

Uses of Fractions. Fractions

Statistics: A Gentle Introduction (3 rd ed.): Test Bank. 1. Perhaps the oldest presentation in history of descriptive statistics was

Use black ink or black ball-point pen. Pencil should only be used for drawing. *

RAPTORS WHO PLAYED WHOM?

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

E X P E R I M E N T 1

11, 6, 8, 7, 7, 6, 9, 11, 9

Q1. In a division sum, the divisor is 4 times the quotient and twice the remainder. If and are respectively the divisor and the dividend, then (a)

FIM INTERNATIONAL SURVEY ON ORCHESTRAS

GCSE MARKING SCHEME AUTUMN 2017 GCSE MATHEMATICS NUMERACY UNIT 1 - INTERMEDIATE TIER 3310U30-1. WJEC CBAC Ltd.

Relationships Between Quantitative Variables

DIFFERENTIATE SOMETHING AT THE VERY BEGINNING THE COURSE I'LL ADD YOU QUESTIONS USING THEM. BUT PARTICULAR QUESTIONS AS YOU'LL SEE

Bite Size Brownies. Designed by: Jonathan Thompson George Mason University, COMPLETE Math

Draft last edited May 13, 2013 by Belinda Robertson

amount base = percent 30% of the class 90% of the points 65% of the televisions

Table of Contents. Introduction...v. About the CD-ROM...vi. Standards Correlations... vii. Ratios and Proportional Relationships...

CALIFORNIA STANDARDS TEST CSM00433 CSM01958 A B C CSM02216 A 583,000

3. What would you decide is a reasonable lower limit for the number of skittles in the container?

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

How would the data in Tony s table change if he recorded the number of minutes he read during a 20 day period instead of a 15 day period?

Human Hair Studies: II Scale Counts

Measurement User Guide

Applications of Mathematics

Introduction to Probability Exercises

Lecture 10: Release the Kraken!

Bootstrap Methods in Regression Questions Have you had a chance to try any of this? Any of the review questions?

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

McRuffy Press Fourth Grade Color Math Test 7

THE CROSSPLATFORM REPORT

6th Grade Advanced Topic II Assessment

in the Howard County Public School System and Rocketship Education

Module 1. Ratios and Proportional Relationships Lessons 11 14

Lesson 25: Solving Problems in Two Ways Rates and Algebra

2016 Excellence in Mathematics Contest Team Project Level I (Precalculus and above) School Name: Group Members:

Transcription:

302360_File_B.qxd 7/7/03 7:18 AM Page 1 Distribution of Data and the Empirical Rule 1 Distribution of Data and the Empirical Rule Stem-and-Leaf Diagrams Frequency Distributions and Histograms Normal Distributions and the Empirical Rule z-scores A Stem-and-Leaf Diagram of a Set of History Test Scores Stems 4 3 9 5 Leaves 6 1 5 8 7 2 3 4 5 5 8 8 4 5 6 6 6 9 6 8 Legend: 8/6 represents 86 Stem-and-Leaf Diagrams Although the mean, the median, the mode, and the standard deviation provide some information about a set of data and the distribution of the data, it is often helpful to use graphical procedures that visually illustrate precisely how the values in a set of data are distributed. Many small sets of data can be graphically displayed by using a stem-andleaf diagram. For instance, consider the following history test scores: 65, 72, 96, 86, 43, 61, 75, 86, 49, 68, 98, 74, 84, 78, 85, 75, 86, 73 In this form the data are called raw data because the data have not been organized. With raw data it is generally difficult to observe how the data are distributed. In the stem-and-leaf diagram shown at the left, we have organized the test scores by placing all the scores that are in the 40s in the top row, the scores that are in the 50s in the second row, the scores that are in the 60s in the third row, and so on. The tens digits of the scores have been placed to the left of the vertical line. In this diagram they are referred to as stems. The ones digits of the test scores have been placed in the proper row to the right of the vertical line. In this diagram they are the leaves. It is now easy to make observations about the distribution of the scores. Only two of the scores are in the 90s, six of the scores are in the 70s, and none of the scores are in the 50s. The lowest score is 43 and the highest is 98. Steps in the Construction of a Stem-and-Leaf Diagram 1. Determine the stems and list the stems in a column from smallest to largest. 2. List the remaining digits of each stem as a leaf to the right of its stem. 3. Include a legend that explains the meaning of the stem and the leaves. Include a title for the diagram. The choice of how many leading digits to use as the stem will depend on the particular application and can be best explained with an example. EXAMPLE 1 Construct a Stem-and-Leaf Diagram A travel agent has recorded the amount spent by customers for a cruise. Construct a stem-and-leaf diagram for the data. Amount Spent for a Cruise, Summer of 2003 $3600 $4700 $7200 $2100 $5700 $4400 $9400 $6200 $5900 $2100 $4100 $5200 $7300 $6200 $3800 $4900 $5400 $5400 $3100 $3100 $4500 $4500 $2900 $3700 $3700 $4800 $4800 $2400 Continued

302360_File_B.qxd 7/7/03 7:18 AM Page 2 2 Solution One method of choosing the stems is to let each thousands digit be a stem and each hundreds digit be a leaf. If the stems and leaves are assigned in this manner, then the notation 2 1, which has a stem of 2 and a leaf of 1, represents a cost of $2100 and the notation 5 4 represents a cost of $5400. The diagram can now be constructed by writing all of the stems, from smallest to largest, in a column to the left of a vertical line and writing the corresponding leaves to the right of the vertical line. Amount Spent for a Cruise Stems Leaves 2 1 1 4 9 3 1 1 6 7 7 8 4 1 4 5 5 7 8 8 9 5 2 4 4 7 9 6 2 2 7 2 3 8 9 4 Legend: 7 3 represents $7300 CHECK YOUR PROGRESS 1 The following table lists the ages of the customers who purchased a cruise. Construct a stem-and-leaf diagram for the data. Ages of Customers Who Purchased a Cruise 32 45 66 21 62 68 61 55 23 38 44 77 46 50 33 35 42 45 51 28 40 41 52 52 72 64 51 33 Solution See page S1. Sometimes two sets of data can be compared by using a back-to-back stemand-leaf diagram, which has common stems with leaves from one data set displayed to the right of the stems and leaves from the other data set displayed to the left of the stems. For instance, the following back-to-back stem-and-leaf diagram shows the test scores for two biology classes that took the same test.

302360_File_B.qxd 7/7/03 7:18 AM Page 3 Distribution of Data and the Empirical Rule 3 Biology Test Scores 8 A.M. class 10 A.M. class 2 4 5 8 7 5 6 7 9 9 5 8 6 2 3 4 8 1 2 3 3 3 7 8 7 1 3 3 5 5 6 8 4 4 5 5 6 8 8 9 8 2 3 6 6 6 2 4 5 5 8 9 4 5 Legend: 3 7 represents 73 Legend: 8 2 represents 82 QUESTION Which biology class did better on the test? Frequency Distributions and Histograms Large sets of data are often displayed using a frequency distribution or a histogram. For example, consider the following situation. An Internet service provider (ISP) has installed new computers. To estimate the new download times its subscribers will experience, the ISP surveyed 1000 subscribers to determine the time each subscriber required to download a particular file from the Internet site music.net. The results of that survey are summarized in the following table. Download time Number of (in seconds) subscribers 0 10 28 10 20 129 20 30 355 30 40 345 40 50 121 50 60 22 Number of subscribers 400 350 300 250 200 150 100 50 0 10 20 30 40 50 60 Download time, in seconds A grouped frequency distribution A histogram of the frequency distribution at the left The above table is called a grouped frequency distribution. It shows how often (frequently) certain events occurred. Each interval 0 10, 10 20,... is called a ANSWER The 8 A.M. class did better on the test because it had more scores in the 80s and 90s and fewer scores in the 40s, 50s, and 60s. The scores in the 70s were similar for both classes.

302360_File_B.qxd 7/7/03 7:18 AM Page 4 4 class. This distribution has six classes. For the 10 20 class, 10 is the lower class boundary and 20 is the upper class boundary. Any data value that lies on a common boundary is assigned to the higher class. The graph of a frequency distribution is called a histogram. A histogram provides a pictorial view of how the data are distributed. In the above histogram, the height of each bar indicates how many subscribers experienced the download times indicated by the class represented below on the horizontal axis. The center point of a class is called a class mark. In the above histogram, the class marks 5, 15, 25, 35, 45, 55 are shown by the red tick marks on the horizontal axis. Instead of using classes with a width of 10 seconds, the ISP could have chosen a smaller class width. A smaller class width produces more classes. For instance, if each class width were 5 seconds, the frequency distribution and histogram for the music.net example would have the form shown below. Download time (in seconds) Number of subscribers 0 5 8 5 10 20 10 15 40 15 20 89 20 25 155 25 30 200 30 35 196 35 40 149 40 45 76 45 50 45 50 55 14 55 60 8 Number of subscribers 200 175 150 125 100 75 50 25 0 5 10 15 20 25 30 35 40 45 50 55 60 Download time, in seconds A frequency distribution with 12 classes A histogram of the frequency distribution at the left Examine the following distribution. It shows the percent of subscribers who are in each class, as opposed to the frequency distribution above, which shows the number of subscribers in each class. The type of frequency distribution that lists the percent of data in each class is called a relative frequency distribution. The relative frequency histogram shown at the right below was drawn by using the data in the relative frequency distribution. It shows the percent of subscribers along its vertical axis.

302360_File_B.qxd 7/7/03 7:18 AM Page 5 Distribution of Data and the Empirical Rule 5 Download time (in seconds) Number of subscribers 0 5 0.8 5 10 2.0 10 15 4.0 15 20 8.9 20 25 15.5 25 30 20.0 30 35 19.6 35 40 14.9 40 45 7.6 45 50 4.5 50 55 1.4 55 60 0.8 Percent of subscribers 20 15 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 Download time, in seconds A relative frequency distribution A relative frequency histogram Download time Percent of (in seconds) subscribers 0 5 0.8 5 10 2.0 10 15 4.0 15 20 8.9 20 25 15.5 25 30 20.0 30 35 19.6 35 40 14.9 40 45 7.6 45 50 4.5 50 55 1.4 55 60 0.8 Sum is 14.9% Sum is 68.8% One advantage of using a relative frequency distribution instead of a frequency distribution is that there is a direct correspondence between the percent of the data that lie in a particular portion of the relative frequency distribution and probability. For instance, in the relative frequency distribution above, the percent of the data that lie between 35 and 40 seconds is 14.9%. Thus, if a subscriber is chosen at random, the probability that the subscriber will require between 35 and 40 seconds to download the music file is 0.149. EXAMPLE 2 Use a Relative Frequency Distribution Use the music.net relative frequency distribution above to determine a. the percent of subscribers who required at least 25 seconds to download the file. b. the probability that a subscriber chosen at random will require from 5 to 20 seconds to download the file. Solution a. The percent of data in all classes with a lower bound of 25 seconds or more is the sum of the percents for all of the classes highlighted in red in the distribution at the left. The percent of subscribers who required at least 25 seconds to download the file is 68.8%. b. The percent of data in all classes with a lower bound of at least 5 seconds and an upper bound of 20 seconds or less is the sum of the percents for all of the classes highlighted in blue in the distribution at the left. Thus the percent of subscribers who required from 5 to 20 seconds to download the file is 14.9%. The probability that a subscriber chosen at random will require from 5 to 20 seconds to download the file is 0.149. Continued

302360_File_B.qxd 7/7/03 7:18 AM Page 6 6 CHECK YOUR PROGRESS 2 Use the relative frequency distribution below to determine a. the percent of the states that pay an average teacher salary of at least $45,000. b. the probability that a state selected at random pays an average teacher salary of at least $30,000 but less than $39,000. Average Salaries of Public School Teachers, 1998 1999 Average Salary, s Number of States Relative Frequency $27,000 s $30,000 $30,000 s $33,000 $33,000 s $36,000 $36,000 s $39,000 $39,000 s $42,000 $42,000 s $45,000 $45,000 s $48,000 $48,000 s $51,000 $51,000 s $54,000 3 6% 7 14% 12 24% 9 18% 6 12% 3 6% 5 10% 3 6% 2 4% Source: www.nea.org. Solution See page S1. There is a geometric analogy between the percents of data and probabilities we calculated in Example 2 and the relative frequency histogram for the data. For instance, the percent of data described in part a. of Example 2 corresponds to the area shown by the red bars in the histogram on the left below. The percent of data described in part b. corresponds to the area shown by the blue bars in the histogram on the right below. 20 20 Percent of subscribers 15 10 5 Percent of subscribers 15 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 0 5 10 15 20 25 30 35 40 45 50 55 60 Download time, in seconds Download time, in seconds 25 seconds or more At least 5 but less than 20 seconds Normal Distributions and the Empirical Rule A histogram for a set of data provides us with a tool that can indicate patterns or trends in the distribution of data. The terms uniform, skewed, symmetrical, and normal are used to describe the distributions of some sets of data.

302360_File_B.qxd 7/7/03 7:18 AM Page 7 Distribution of Data and the Empirical Rule 7 A uniform distribution, shown in the figure below, is generated when all of the observed events occur with the same frequency. The graph of a uniform distribution remains at the same height over the range of the data. Some random processes produce distributions that are uniform or nearly uniform. For example, if the spinner below is used to generate numbers, then in the long run each of the numbers 1, 2, 3,..., 8 will be generated with approximately the same frequency. Uniform distribution Random number generator Frequency of x 1 2 3 4 5 6 7 8 x 4 7 1 3 6 8 2 5 Frequency of x Symmetrical distribution Center line mean = median = mode x A symmetrical distribution, shown at the left, is symmetrical about a vertical center line. If you fold a symmetrical distribution along the center line, the right side of the distribution will match the left side. The following data sets are examples of distributions that are nearly symmetrical: the weights of all male students, the heights of all teenage females, the prices of a gallon of regular gasoline in a large city, the mileages for a particular type of automobile tire, and the amounts of soda dispensed by a vending machine. In a symmetrical distribution, the mean, the median, and the mode are all equal and they are located at the center of the distribution. Skewed distributions, shown in the figures below, have a longer tail on one side of the distribution and shorter tail on the other side. A distribution is skewed to the left if it has a longer tail on the left and is skewed to the right if it has a longer tail on the right. In a distribution that is skewed to the left, the mean is less than the median, which is less than the mode. In a distribution that is skewed to the right, the mode is less than the median, which is less than the mean. Skewed distributions Frequency of x Skewed left Frequency of x Skewed right mean median mode x mode median mean x Many examinations yield test scores that have skewed distributions. For instance, if a test designed for students in the sixth grade is given to students in a ninth grade class, most of the scores will be high, and the distribution of the test scores will be skewed to the left. Discrete values are separated from each other by an increment, or space. For example, only whole numbers are used to record the number of points a

302360_File_B.qxd 7/7/03 7:18 AM Page 8 8 basketball player scores in a game. The possible numbers of points s that the player can score are restricted to the discrete values 0, 1, 2, 3, 4,.... The variable s is a discrete variable. Different scores are separated from each other by at least 1 point. Any variable that is based on counting procedures is a discrete variable. Histograms are generally used to show the distribution of discrete variables. Continuous values are values that can take on all real numbers in some interval. For example, the possible times that it takes to drive to the grocery store represent a continuous value. The time is not restricted to natural numbers such as 4 minutes or 5 minutes. In fact, the time may be any part of a minute, or of a second if we care to measure that precisely. A variable such as time that is based on measuring with smaller and smaller units is a continuous variable. Continuous curves, rather than histograms, are used to show the distributions of continuous variables. Distributions of continuous variables f(t) f(x) f(w) a. Bimodal t b. Skewed right x c. Symmetrical w In some cases a continuous curve is used to display the distribution of a set of discrete data. For instance, when we have a large set of data and the class intervals are very small, the shape of the top of the histogram approaches a smooth curve. See the two figures below. Thus, when graphing the distribution of very large sets of data with very small class intervals, it is common practice to replace the histogram with a smooth continuous curve. A histogram for discrete data A continuous distribution curve f(x) f(x) If x is a continuous variable with mean (the Greek letter mu) and standard deviation, then its normal distribution is given by f x e 1 2 x 2 2 x One of the most important statistical distributions is known as a normal distribution. The precise mathematical definition of a normal distribution is given by the equation in the Take Note at the left; however, for many problems it is sufficient to know that all normal distributions have the following properties. x

302360_File_B.qxd 7/7/03 7:18 AM Page 9 Distribution of Data and the Empirical Rule 9 Properties of a Normal Distribution A normal distribution has a bell shape that is symmetric about a vertical line through its center. The mean, the median, and the mode of a normal distribution are all equal and they are located at the center of the distribution. f(x) A normal distribution 2.15% 2.15% 13.6% 34.1% 34.1% 13.6% x µ 3σ µ 2 σ µ σ µ µ + σ µ + 2 σ µ + 3σ 68.2% of the data 95.4% of the data 99.7% of the data The Empirical Rule: In a normal distribution, about 68.2% of the data lies within 1 standard deviation of the mean. 95.4% of the data lies within 2 standard deviations of the mean. 99.7% of the data lies within 3 standard deviations of the mean. The Empirical Rule can be used to solve many problems that involve a normal distribution. f(x) Data within 2 σ of µ 34.1% 13.6% 34.1% 13.6% µ 2σ µ µ + 2σ x 95.4% EXAMPLE 3 Use the Empirical Rule A survey of 1000 U.S. gas stations found that the price charged for a gallon of regular gas can be closely approximated by a normal distribution with a mean of $1.90 and a standard deviation of $0.20. How many of the stations charge a. between $1.50 and $2.30 for a gallon of regular gas? b. less than $2.10 for a gallon of regular gas? c. more than $2.30 for a gallon of regular gas? Solution a. The $1.50 per gallon price is 2 standard deviations below the mean. The $2.30 price is 2 standard deviations above the mean. In a normal distribution, 95.4% of all data lies within 2 standard deviations of the mean. (See the normal distribution at the left.) Therefore, approximately 95.4% 1000 0.954 1000 954 of the stations charge between $1.50 and $2.30 for a gallon of regular gas. Continued

302360_File_B.qxd 7/7/03 7:18 AM Page 10 10 f(x) Data less than 1 σ above µ f(x) 50% 84.1% of the data 34.1% µ µ + σ Data more than 2 σabove µ 2.3% µ 2σ µ 95.4% 2.3% µ + 2σ x x b. The $2.10 price is 1 standard deviation above the mean. (See the normal distribution at the left.) In a normal distribution, 34.1% of all data lies between the mean and 1 standard deviation above the mean. Thus, approximately 34.1% 1000 0.341 1000 341 of the stations charge between $1.90 and $2.10 for a gallon of regular gasoline. Half of the stations charge less than the mean. Therefore, about 341 500 841 of the stations charge less that $2.10 for a gallon of regular gas. This problem can also be solved by computing 34.1% 50% 84.1% of 1000. c. The $2.30 price is 2 standard deviations above the mean. In a normal distribution, 95.4% of all data is within 2 standard deviations of the mean. This means that the other 4.6% of the data will lie either more than 2 standard deviations above the mean or less than 2 standard deviations below the mean. We are only interested in the data that lie more than 2 standard deviations 1 above the mean, which is 2 of 4.6%, or 2.3%, of the data. (See the distribution at the left.) Thus about 2.3% 1000 0.023 1000 23 of the stations charge more than $2.30 for a gallon of regular gas. CHECK YOUR PROGRESS 3 A vegetable distributor knows that during the month of August, the weights of its tomatoes were normally distributed with a mean of 0.61 pound and a standard deviation of 0.15 pound. a. What percent of the tomatoes weighed less than 0.76 pound? b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to weigh more than 0.31 pound? c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to weigh between 0.31 and 0.91 pound? Solution See page S1. z-scores When you take a test, it is natural to wonder how you will do compared to the other students in the class. Will you finish in the top 10%, or will you be closer to the middle? One statistic that is used to measure the position of a data value with respect to other data values is known as the z-score. z-score The z-score for a given data value x is the number of standard deviations between x and the mean of the data. The following formulas are used to calculate the z-score for a data value x. Population: z x x Sample: z x x x s In the next example, we use a student s z-scores for two tests to determine how well the student did on each test in comparison to the other students.

302360_File_B.qxd 7/7/03 7:18 AM Page 11 Distribution of Data and the Empirical Rule 11 In any application, the quantity x and the standard deviation are both measured in the same units. Thus a z-score, which is the quotient of x and, is a dimensionless measure. EXAMPLE 4 Use z-scores a. Ruben has taken two tests in his math class. He scored 72 on the first test, for which the mean was 65 and the standard deviation was 8. He received a 60 on the second test, for which the mean was 45 and the standard deviation was 12. In comparison to the other students, did Ruben do better on the first or the second test? b. Stacy is in the same math class as Ruben. Stacy s z-score for the first test was 0.75. What was Stacy s score on the first test? Solution 72 65 60 45 a. The z-score formula yields z 72 8 0.875 and z 60 12 1.25. Thus Ruben scored 0.875 standard deviations above the mean on his first test and 1.25 standard deviations above the mean on the second test. In comparison to his classmates, Ruben scored better on the second test than on the first test. b. Substitute into the z-score formula and score for x. 0.75 6 x 65 x 59 x 65 8 Stacy s score on the first test was 59. CHECK YOUR PROGRESS 4 a. Cheryl took two quizzes in her history class. She scored 15 on the first quiz, for which the mean was 12 and the standard deviation was 2.4. Her score on the second quiz, for which the mean was 11.5 and the standard deviation was 2.2, was 14. In comparison to her classmates, did Cheryl do better on the first or the second quiz? b. Greg is in the same history class as Cheryl. Greg s z-score for the first quiz was 2.5. What was Greg s score on the first quiz? Solution See page S1. Topics for Discussion 1. Is it possible, in a normal distribution of data, for the mean to be much larger than the median? Explain. 2. Must all large data sets have a normal distribution? Explain. 3. A professor gave a final examination to 110 students. Eighteen students had examination scores that were more than one standard deviation above the mean. Does this indicate that 18 of the students had examination scores that were less than one standard deviation below the mean? Explain. 4. A set of data consists of the 525 monthly salaries, listed in dollars, of the employees of a large company. What units should be used for the z-scores associated with the salaries? Explain.

302360_File_B.qxd 7/7/03 7:18 AM Page 12 12 EXERCISES In Exercises 1 to 8, determine whether the given statement is true or false. 1. If a distribution is symmetric about a vertical line, then it is a normal distribution. 2. Every normal distribution has a bell-shaped graph. 3. In a normal distribution, the mean, the median, and the mode of the distribution all are located at the center of the distribution. 4. In a distribution that is skewed to the left, the median of the data is greater than the mean. 5. If a z-score for a data value x is negative, then x must also be negative. 6. In every data set, 68.2% of the data lies within 1 standard deviation of the mean. 7. Let x be the number of people who attend a baseball game. The variable x is a discrete variable. Business and Economics 11. State Sales Tax Rates Use the following frequency distribution to determine a. the percent of states in the U.S. that had a 2001 sales tax of at least 5%. b. the probability that a state selected at random had a 2001 sales tax rate of at least 3% but less than 5%. 2001 State Sales Tax Rate Number Relative Tax rate, r of states frequency 0% r 1% 1% r 2% 2% r 3% 3% r 4% 4% r 5% 5% r 6% 6% r 7% 5 10% 0 0% 1 2% 0 0% 13 26% 15 30% 13 26% 8. The time of day d in the lobby of a bank is measured with a digital clock. The variable d is a continuous variable. 7% r 8% Source: Time Almanac 2002 3 6% In Exercises 9 and 10, use the Empirical Rule to answer each question. 9. In a normal distribution, what percent of the data lies a. within 2 standard deviations of the mean? b. more than 1 standard deviation above the mean? c. between 1 standard deviation below the mean and 2 standard deviations above the mean? 10. In a normal distribution, what percent of the data lies a. within 3 standard deviations of the mean? b. less than 2 standard deviations below the mean? c. between 2 standard deviations below the mean and 3 standard deviations above the mean? 12. Waiting Time The amount of time customers spend waiting in line at a bank is normally distributed, with a mean of 3.5 minutes and a standard deviation of 0.75 minute. Find the probability that the time a customer will spend waiting is a. at most 2.75 minutes. b. less than 2 minutes. 13. Weights of Parcels During a particular week, an overnight delivery company found that the weights of its parcels were normally distributed, with a mean of 24 ounces and a standard deviation of 6 ounces. a. What percent of the parcels weighed between 12 ounces and 30 ounces? b. What percent of the parcels weighed more than 42 ounces?

302360_File_B.qxd 7/7/03 7:18 AM Page 13 Distribution of Data and the Empirical Rule 13 14. Weights of Boxes of Corn Flakes The weights of the boxes of corn flakes filled by a machine are normally distributed, with an average weight of 14.5 ounces and a standard deviation of 0.5 ounce. What percent of the boxes a. weigh less than 14.0 ounces? b. weigh between 13.5 and 15.0 ounces? 15. Duration of Long Distance Telephone Calls A telephone company has found that the lengths of its long distance telephone calls are normally distributed, with a mean of 225 seconds and a standard deviation of 55 seconds. What percent of its long distance calls last a. more than 335 seconds? Social Sciences 19. Presidential Inauguration Ages and Ages at Death The table in Exercise 26 of Section 8.4 lists the U.S. presidents and their ages at inauguration. The table in Exercise 27 of Section 8.4 lists the deceased U.S. presidents as of December 2002, and their ages at death. Marshall/Liaison/Getty Images a. Construct a back-to-back stem-and-leaf diagram for the data in the tables. b. What patterns, if any, are evident from the diagram? b. between 170 and 390 seconds? Life and Health Sciences 16. Median Income for Physicians The 1995 median income for physicians was $160,000. (Source: AMA Center for Health Policy Research) The distribution of these incomes is skewed to the right. Is the mean of these incomes greater than or less than $160,000? 17. Heights of Women A survey of 1000 women aged 20 to 30 found that their heights are normally distributed, with a mean of 65 inches and a standard deviation of 2.5 inches. a. How many of the women have a height that is within 1 standard deviation of the mean? 20. Average Salaries of Teachers Use the following frequency distribution to determine a. the percent of states in the U.S. that paid a 1998 1999 average teacher salary of at least $39,000. b. the probability that a state selected at random paid a 1998 1999 average teacher salary of at least $36,000 but less than $45,000. Average Salaries of Public School Teachers, 1998 1999 Number Relative Average salary, s of states frequency $27,000 s $30,000 $30,000 s $33,000 $33,000 s $36,000 3 6% 7 14% 12 24% b. How many of the women have a height that is between 60 inches and 70 inches? 18. Distribution of Data Consider the set of the heights of all babies born in the United States during a particular year. Do you think this data set can be closely approximated by a normal distribution? Explain. $36,000 s $39,000 $39,000 s $42,000 $42,000 s $45,000 $45,000 s $48,000 $48,000 s $51,000 $51,000 s $54,000 Source: www.nea.org. 9 18% 6 12% 3 6% 5 10% 3 6% 2 4%

302360_File_B.qxd 7/7/03 7:18 AM Page 14 14 21. Test Scores The following relative frequency histogram shows the distribution of test scores for 50 students who took a history test. Relative frequency 25% 20% 15% 10% 5% 0% 28 36 44 52 60 68 76 84 92 100 Test scores a. What percent of the students scored at least 76 on the test? 25. Comparison of Quiz Scores Ryan took two quizzes in his art class. He scored 45 on the first quiz, for which the mean was 51.4 and the standard deviation was 9.5. His score on the second quiz, for which the mean was 53.6 and the standard deviation was 7.2, was 49. In comparison to his classmates, did Ryan do better on the first or the second quiz? 26. Comparison of Test Scores Tanya took two tests in her chemistry class. She scored 85 on the first test, for which the mean was 79.4 and the standard deviation was 6.4. Her score on the second test, for which the mean was 70.5 and the standard deviation was 5.3, was 78. In comparison to her classmates, did Tanya do better on the first or the second test? b. How many of the students received a score of at least 60 but less than 84? 22. Examination Duration Times At a university, 500 law students took an examination. One student completed the exam in 24 minutes. The mode for the completion time is 50 minutes. The distribution of the times the students took to complete the exam is skewed to the left. Is the mean of these times greater than or less than 50 minutes? 23. Intelligence Quotients A psychologist finds that the intelligence quotients of a group of patients are normally distributed, with a mean of 104 and a standard deviation of 26. Find the percent of the patients with IQs a. above 130. Sports and Recreation 27. Super Bowl Scores The following table lists the winning and losing scores for all of the Super Bowl games up to the year 2001. Super Bowl Results, 1967 2001 AP/ Wide World Photos 35 10 24 7 27 10 42 10 49 26 33 14 16 6 26 21 20 16 27 17 16 7 21 17 27 17 55 10 35 21 23 7 32 14 38 9 20 19 31 24 16 13 27 10 38 16 37 24 34 19 24 3 35 31 46 10 52 17 23 16 14 7 31 19 39 20 30 13 34 7 a. Construct a back-to-back stem-and-leaf diagram for the winning scores and the losing scores. b. between 130 and 182. b. What patterns, if any, are evident from the backto-back stem-and-leaf diagram? 24. Distribution of Data The population of a resort city consists mostly of wealthy families and families with low incomes. Do you think the set of family incomes for this city can be closely approximated by a normal distribution? Explain. 28. Ironman Triathlon The following table lists the winning times for the men s and women s Ironman Triathlon World Championships, held in Kailua-Kona, Hawaii. (Source: http://www.3athlon.org/races/ironman/ hawaii2001/statistik/index.php)

302360_File_B.qxd 7/7/03 7:18 AM Page 15 Distribution of Data and the Empirical Rule 15 Ironman Triathlon World Championships (Winning times rounded to the nearest minute) Men, 1978 2000 Women, 1979 2000 11:47 8:29 8:20 12:55 9:35 9:17 11:16 8:34 8:21 11:21 9:01 9:07 9:25 8:31 8:04 12:01 9:01 9:32 9:38 8:09 8:33 10:54 9:14 9:24 9:08 8:28 8:24 10:44 9:08 9:13 9:06 8:19 8:17 10:25 8:55 9:26 8:54 8:09 8:21 10:25 8:58 8:51 8:08 9:49 9:20 a. Construct a back-to-back stem-and-leaf diagram for the data in the tables. Hint: Use the two-digit minutes as your leaves, and insert a comma between the leaves in each row so that they can be easily distinguished from each other. b. What patterns, if any, are evident from the backto-back stem-and-leaf diagram? 29. Home Run Leaders The following tables list the numbers of home runs hit by the home run leaders in the National and the American League from 1971 to 2001. 30. Race Times The following relative frequency histogram shows the distribution of times for the 1200 contestants who finished a race. Relative frequency 24% 20% 16% 12% 8% 4% 0% 50 60 70 80 90 100 110 120 Time, in seconds a. What percent of the contestants finished the race in less than 80 seconds? b. How many contestants had a time of at least 60 seconds but less than 80 seconds? 31. Baseball Attendance A baseball franchise finds that the attendance at its home games is normally distributed, with a mean of 16,000 and a standard deviation of 4000. a. What percent of the home games have an attendance between 8000 and 16,000? b. What percent of the home games have an attendance of less than 12,000? Home Run Leaders, 1971 2001 National League 48 40 44 36 38 38 52 40 48 48 31 37 40 36 37 37 49 39 47 40 38 35 46 43 40 47 49 70 65 50 73 American League 33 37 32 32 36 32 39 46 45 41 22 39 39 43 40 40 49 42 36 51 44 43 46 40 50 52 56 56 48 47 52 a. Construct a back-to-back stem-and-leaf diagram for the data in the tables. b. What patterns, if any, are evident from the backto-back stem-and-leaf diagram? Physical Sciences and Engineering 32. Breaking Points of Ropes The breaking points of a particular type of rope are normally distributed, with a mean of 350 pounds and a standard deviation of 24 pounds. What is the probability that a piece of this rope chosen at random will have a breaking point of a. less than 326 pounds? b. between 302 and 398 pounds? 33. Tire Mileage The mileages of WearEver tires are normally distributed, with a mean of 48,000 miles and a standard deviation of 6000 miles. What is the probability that the WearEver tire you purchase will provide a mileage of a. more than 60,000 miles? b. between 42,000 and 54,000 miles?

302360_File_B.qxd 7/7/03 7:18 AM Page 16 16 34. Highway Speed of Vehicles A study of 8000 vehicles that passed by a highway checkpoint found that their speeds were normally distributed, with a mean of 61 miles per hour and a standard deviation of 7 miles per hour. a. How many of the vehicles had a speed of more than 68 miles per hour? b. How many of the vehicles had a speed of less than 40 miles per hour? Explorations Chebyshev s Theorem The following well-known theorem is called Chebyshev s theorem. It is named after the Russian mathematician Pafnuty Lvovich Chebyshev (1821 1894). Chebyshev s theorem states that a mathematical relationship exists between the spread of data and the standard deviation of the data. A remarkable property of Chebyshev s theorem is that it is valid for any set of data. This is unlike the Empirical Rule, which applies only to sets of data that have normal distributions. Chebyshev s Theorem The proportion or percentage of any data set that lies within z standard deviations of the mean, where z is any positive number greater than 1, is at least Applying Chebyshev s theorem with z 2 yields 1 1 z 2 1 1 2 2 1 1 4 3 4 3 This result of 75% means that at least 75% of the data 4 in any data set must lie within 2 standard deviations of the mean of the data set. 1. Use Chebyshev s theorem to determine the minimum percentage of data (to the nearest percent) in any data set that must lie within a. 1.2 standard deviations of the mean. b. 2.5 standard deviations of the mean. c. 3.1 standard deviations of the mean. 2. A new automobile dealership found that during the month of March, the mean selling price of its cars was $29,200, with a standard deviation of $5100. Use Chebyshev s theorem to determine the minimum percentage (to the nearest percent) of the dealership s cars that have a selling price within a. 1.5 standard deviations of the mean that is, between $21,550 and $36,850. b. 2.8 standard deviations of the mean that is, between $14,920 and $43,480. 1 1 z 2