13.1 What is Statistics? What is Statistics? The collection of all outcomes, responses, measurements, or counts that are of interest. A portion or subset of the population. Statistics Is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. ch 13 Angel & Porter (6th ed) 1 Descriptive Statistics Involves organizing, summarizing, and displaying data. e.g. Tables, charts, averages Statistics Inferential Statistics Involves using sample data to draw conclusions about a population. ch 13 Angel & Porter (6th ed) 2 Sampling Techniques Simple Random Sample: Each member of the population has an equal chance of being selected. Please read section 13.2 The Misuses of Statistics ch 13 Angel & Porter (6th ed) 3
13.3 Frequency Distributions Consider the following data collected from students in a class: Number of traffic tickets received 0 4 4 5 4 1 5 5 6 7 0 4 3 1 3 3 3 4 3 4 2 1 6 4 5 It is usually helpful to summarize a large amount of data in a frequency distribution. ch 13 Angel & Porter (6th ed) 4 A frequency distribution is a listing of the observed values and the corresponding frequency of occurrence of each value. Example Construct a frequency distribution for the number of traffic tickets received. Number of traffic tickets received 0 4 4 5 4 1 5 5 6 7 0 4 3 1 3 3 3 4 3 4 2 1 6 4 5 ch 13 Angel & Porter (6th ed) 5 If you have a large data set in which few numbers are repeated, it may be helpful to create a Grouped Frequency Distribution. Example The following data represent the monthly account balances (to the nearest dollar) for a sample of fifty credit card users. 138 78 175 46 79 118 90 163 88 107 126 154 85 60 42 54 62 128 114 73 129 130 81 67 119 116 145 105 96 71 100 145 117 60 125 130 94 88 136 112 118 84 74 62 81 110 108 71 85 165 ch 13 Angel & Porter (6th ed) 6
Rules for Data Grouped by Classes 1. The classes should be of the same width. 2. The classes should not overlap. 3. Each piece of data should belong to only one class. You should use between classes (intervals). Lets arbitrarily make the first interval go from 40 59. This means the second interval must start at 60. We say the class width is 20, since there are 20 numbers in the first interval (40, 41, 42,, 58, 59) ch 13 Angel & Porter (6th ed) 7 The modal class of a frequency distribution is the class with the highest frequency. The midpoint of a class (called the class mark) is found by lower limit + upper limit 2 ch 13 Angel & Porter (6th ed) 8 Example #16 Construct a frequency distribution with a first class of 42-47. 57 57 49 52 50 51 51 56 46 61 61 64 56 47 56 60 61 57 54 50 46 55 55 62 52 57 68 48 54 54 51 43 69 58 51 65 49 42 54 55 64 ch 13 Angel & Porter (6th ed) 9
13.4 Statistical Graphs Often it is easier to understand information when it is summarized in a graph. We will look at 3 types of graphs. Circle Graphs (Pie Graphs) Shows the relationship of each category to the whole by visually comparing the sizes of the slices of the pie. Information displayed in a circle graph needs to be categories (non-numeric). Letter grade distribution on an exam D 13% C 17% F 9% A 48% B 13% ch 13 Angel & Porter (6th ed) 10 Example #10 In 2000 there are 66.6 million households online worldwide. Of the total, 57% are in North America. Estimate the number of households online in each region shown on the graph. Asia/Pacific rim 15% Europe 25% Other 3% North America 57% ch 13 Angel & Porter (6th ed) 11 Histograms and frequency polygons are used to illustrate numeric data contained in a frequency distribution. Histogram Observed data is placed on the horizontal axis and frequencies on the vertical axis. A rectangle is placed above each value or class indicating the frequency for that value or class. Histogram 12 10 8 6 4 2 0 0 1 2 3 4 observed values ch 13 Angel & Porter (6th ed) 12 Frequency
Frequency Polygon Observed data is placed on the horizontal axis and frequencies on the vertical axis. A dot is placed at the corresponding frequency above each observed value or class. The dots are connected with straight line segments. Frequency 12 10 8 6 4 2 0 Frequency Polygon -1 0 1 2 3 4 5 observed values ch 13 Angel & Porter (6th ed) 13 Example #14 The frequency distribution shown indicates the ages of a group of 40 people attending a party. (A) Construct a histogram of the frequency distribution. (B) Construct a frequency polygon of the frequency distribution. Age 20 21 22 23 24 25 26 27 Number of People 6 3 0 4 6 3 8 10 ch 13 Angel & Porter (6th ed) 14 Example #16 The frequency distribution illustrates the annual salaries, in thousands of dollars, of the people in management positions at the Bradley Thomas Corporation. (A) Construct a histogram of the frequency distribution. (B) Construct a frequency polygon of the frequency distribution. Salary (in $1000) 20-25 26-31 32-37 38-43 44-49 50-55 56-61 Number of People 4 6 8 9 8 5 3 ch 13 Angel & Porter (6th ed) 15
Example The following histogram represents the record high temperature for different states. Frequency 20 18 16 14 12 10 8 6 4 2 0 Record High Temperature 102 107 112 117 122 127 132 Temperature ch 13 Angel & Porter (6th ed) 16 (A) How many states were surveyed? (B) What are the lower and upper class limits of the first and second classes? (C) How many states have a record high temperature in the class with a class mark of 122? (D) What is the class mark of the modal class? ch 13 Angel & Porter (6th ed) 17 13.5 Measures of Central Tendency I. Mean: The sum of all data values divided by the number of values For a sample: Sigma Notation: add all of the data values (x) in the data set. ch 13 Angel & Porter (6th ed) 18
Example An instructor recorded the number of absences for his students in one semester. For a random sample the data are: 2 4 2 0 40 2 4 3 6 Find the sample mean. ch 13 Angel & Porter (6th ed) 19 II. Median: The middle value of an data set. Half of the measurements fall below the median and half are above. Example An instructor recorded the number of absences for his students in one semester. For a random sample the data are: 2 4 2 0 40 2 4 3 6 Find the median. ch 13 Angel & Porter (6th ed) 20 III. Mode: The value with the highest frequency. If no entry is repeated, there is no mode. Example An instructor recorded the number of absences for his students in one semester. For a random sample the data are: 2 4 2 0 40 2 4 3 6 Find the mode. ch 13 Angel & Porter (6th ed) 21
IV. Midrange: The value halfway between the lowest and highest values in the data set. Midrange = Example An instructor recorded the number of absences for his students in one semester. For a random sample the data are: 2 4 2 0 40 2 4 3 6 Find the midrange. ch 13 Angel & Porter (6th ed) 22 Example (cont) Suppose the student with 40 absences is dropped from the course. Calculate the mean, median, mode, and midrange of the remaining values. Compare the effect of the change to each type of average. 2 4 2 0 2 4 3 6 ch 13 Angel & Porter (6th ed) 23 Comparing the Mean, Median, Mode, and Midrange The mean is used most often because it uses all of the data values in its computation. Thus it is almost always a good representative value. The mean is the only measure of central tendency that can be affected by any change in the data set. If the data set contains "extreme values" (called ) the median provides a more accurate measure of central tendency. Look at the effect of the 40 on the mean and median in the previous example. ch 13 Angel & Porter (6th ed) 24
The mode is the easiest to "compute" however it may not be very useful if the data set is small. The mode is useful when discussing such ideas as shoe size. If a retailer is ordering shoes, it would be helpful to know the most common shoe size. The midrange is seldom used. Because it only uses the lowest and highest values, it is too sensitive to extreme values. ch 13 Angel & Porter (6th ed) 25 13.6 Measures of Dispersion Tells how spread out the data is. Consider the heights of the five starting players on each of two men s college basketball teams. Team A 73 Team B 72 72 76 67 76 76 76 78 84 Mean = 75 Mean = 75 Median = 76 Median = 76 Mode = 76 Mode = 76 These sets are different due to variation. ch 13 Angel & Porter (6th ed) 26 I. Range = Example Range A = Range B = Drawback: The range only uses 2 numbers from a data set. ch 13 Angel & Porter (6th ed) 27
The deviation for each value x is the difference between the value of x and the mean of the data set. In a sample, the deviation for each value x is: ( x x) 2 II. Sample Standard Deviation: s = x x n 1 ch 13 Angel & Porter (6th ed) 28-3 -2 72 73 74 75 76 77 78 x = 75 ch 13 Angel & Porter (6th ed) 29 1 1 3 Procedure to find the standard deviation: (p. 696) 1. Calculate the mean. 2. Make a chart with 3 columns: Data Data Mean (Data Mean) 2 3. Fill in each column. 4. Add the values in the (Data Mean) 2 column. 5. Divide the sum by n 1. 6. Take the square root of the quotient. ch 13 Angel & Porter (6th ed) 30
Example Calculate the standard deviation for Team B. Data 67 Data - Mean (Data Mean) 2 72 76 76 84 ch 13 Angel & Porter (6th ed) 31 13.7 The Normal Curve When we look at a histogram we can see the overall shape of the distribution of data. Some shapes occur more often than others. 20 15 10 5 Number of Children per Family 0 0 1 2 3 4 5 6 7 8 9 Number of children ch 13 Angel & Porter (6th ed) 32 Skewed Right Skewed Left Normal Distribution ch 13 Angel & Porter (6th ed) 33
Data with Normal distribution has the following characteristics. About About x 3s x 2s x s x x + s x + 2s x + 3s of the data lies within 1 standard deviation of the mean of the data lies within 2 standard deviations of the mean Almost all of the data lies within 3 standard deviations of the mean ch 13 Angel & Porter (6th ed) 34 Example An instruction manual claims that the assembly time for a product is normally distributed with a mean of 4.2 hours and standard deviation 0.3 hours. What percentage of products will have assembly times between 3.6 hours and 4.8 hours? What if we wanted to know what percentage of products will have assembly times more than 4.7 hours. To answer this question requires using a z-score. ch 13 Angel & Porter (6th ed) 35 The z-score, represents the number of standard deviations a random variable x falls from the mean µ. value - mean x x z = = standard deviation s Example An instruction manual claims that the assembly time for a product is normally distributed with a mean of 4.2 hours and standard deviation 0.3 hours. Find the standard z-score for an assembly time of: (a) 3.6 hrs (b) 4.2 hrs (c) 4.7 hrs Z-Score has 2 parts: (1) sign - above or below the mean (2) numerical value - # of standard deviations away from the mean ch 13 Angel & Porter (6th ed) 36
Most questions we need to answer about a normal distribution involve values other than those within 1, 2, or 3 standard deviations away from the mean. To answer these questions, we use the z-score and a table of percentage values. Table 13.7 (p. 706) gives the area (percentage) under the normal curve between the mean, z = 0, and a z-value to the right of the mean. The total area under the normal curve is 1.0 = 100% ch 13 Angel & Porter (6th ed) 37 Example Use Table 13.7 to find the specified area. A) Above the mean. B) Between z = 0 and z = 1.00 C) Between z = -1.00 and z = 0 Since the curve is symmetric about the mean, the area between the mean and a positive z-score is the same as the area between the mean and the corresponding negative z- score. ch 13 Angel & Porter (6th ed) 38 Example (Cont.) D) Between z = -2.00 and z = 2.00 E) Between z = 1.23 and z = 2.35 See Procedure to find the Percent of Data Between any Two Values on p. 707 Example (Cont.) F) To the right of z = 1.73 G) To the left of z = 1.08 ch 13 Angel & Porter (6th ed) 39
Example An instruction manual claims that the assembly time for a product is normally distributed with a mean of 4.2 hours and standard deviation 0.3 hours. A) What percentage of products will have assembly times more than 4.7 hours? B) What percentage of products will have assembly times between 3.5 hours and 3.9 hours? Remember to draw a picture for each problem! ch 13 Angel & Porter (6th ed) 40 Example The life expectancy of nondefective GE light bulbs normally distributed, with a mean life of 1500 hours and a standard deviation of 100 hours. #73 Find the percent of bulbs that will last more than 1450 hours. #74 Find the percent of bulbs that last between 1400 hours and 1550 hours. #75 Find the percent of bulbs that last less than 1480 hours. ch 13 Angel & Porter (6th ed) 41