MATH 214 (NOTES) Math 214 Al Nosedal Department of Mathematics Indiana University of Pennsylvania MATH 214 (NOTES) p. 1/3
CHAPTER 1 DATA AND STATISTICS MATH 214 (NOTES) p. 2/3
Definitions. Statistics is defined as the science of collecting, analyzing, presenting, and interpreting data. Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. Elements are the entities on which data are collected. A variable is a characteristic of interest for the elements. Data can also be classified as either qualitative or quantitative. Qualitative data include labels or names used to identify an attribute of each element. Quantitative data require numeric values that indicate how much or how many. MATH 214 (NOTES) p. 3/3
Descriptive Statistics Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy for the reader to understand. Such summaries of data, which may be tabular, graphical, or numerical, are referred to as descriptive statistics. MATH 214 (NOTES) p. 4/3
Statistical Inference Many situations require information about a large group of elements. But, because of time, cost, and other considerations, data can be collected from only a small portion of the group. The larger group of elements in a particular study is called the population, and the smaller group is called the sample. As one of its major contributions, statistics uses data from a sample to make estimates and test hypotheses about the characteristics of a population through a process referred to as statistical inference. MATH 214 (NOTES) p. 5/3
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS MATH 214 (NOTES) p. 6/3
Summarizing Qualitative Data Frequency distribution. A frequency distribution is a tabular summary of data showing the number (frequency) of items in each of several nonoverlapping classes. Relative frequency of a class = Frequency of the class n where n represents the total number of observations. MATH 214 (NOTES) p. 7/3
Bar graphs and pie charts A bar graph, is a graphical device for depicting qualitative data summarized in a frequency, relative frequency, or percent frequency distribution. On one axis of the graph, we specify the labels that are used for the classes (categories). A frequency, relative frequency, or percent frequency scale can be used for the other axis of the graph. The pie chart provides another graphical device for presenting relative frequency and percent frequency distributions for qualitative data. MATH 214 (NOTES) p. 8/3
Summarizing Quantitative Data A common graphical presentation of quantitative data is a histogram. This graphical summary can be prepared for data previously summarized in either a frequency, relative frequency, or percent frequency distribution. A histogram is constructed by placing the variables of interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the vertical axis. MATH 214 (NOTES) p. 9/3
Exercise (page 40) 11. Consider the following data 14 21 23 21 16 19 22 25 16 16 24 24 25 19 16 19 18 19 21 12 16 17 18 23 25 20 23 16 20 19 24 26 15 22 24 20 22 24 22 20 a. Develop a frequency distribution using classes of 12-14, 15-17, 18-20, 21-23, and 24-26. b. Develop a relative frequency distribution and a percent frequency distribution using the classes in part (a). c. Make a histogram. MATH 214 (NOTES) p. 10/3
Solution class freq. relative freq. percent freq. 12-14 2 2/40 0.05 15-17 8 8/40 0.20 18-20 11 11/40 0.275 21-23 10 10/40 0.25 24-26 9 9/40 0.225 MATH 214 (NOTES) p. 11/3
Describing distributions with numbers How much do people with a bachelor s degree (but no higher degree) earn? Here are the incomes of 15 such people, chosen at random by the Census Bureau in March 2002 and asked how much they earned in 2001. Most people reported their incomes to the nearest thousand dollars, so we have rounded their responses to thousands of dollars. 110 25 50 50 55 30 35 30 4 32 50 30 32 74 60 How could we find the "typical" income for people with a bachelor s degree (but no higher degree)? MATH 214 (NOTES) p. 12/3
Describing distributions with numbers How much do people with a bachelor s degree (but no higher degree) earn? Here are the incomes of 15 such people, chosen at random by the Census Bureau in March 2002 and asked how much they earned in 2001. Most people reported their incomes to the nearest thousand dollars, so we have rounded their responses to thousands of dollars. 110 25 50 50 55 30 35 30 4 32 50 30 32 74 60 How could we find the "typical" income for people with a bachelor s degree (but no higher degree)? MATH 214 (NOTES) p. 12/3
CHAPTER 3 DESCRIPTIVE STATISTICS: NUMERICAL MEASURES MATH 214 (NOTES) p. 13/3
Measuring center: the mean The most common measure of center is the ordinary arithmetic average, or mean. To find the mean of a set of observations, add their values and divide by the number of observations. If the n observations are x 1,x 2,...,x n, their mean is (1) or in more compact notation, x = x 1 + x 2 +... + x n n (2) x = 1 n n x i i=1 MATH 214 (NOTES) p. 14/3
Measuring center: the median The median M is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of the distribution: Arrange all observations in order of size, from smallest to largest. If the number of observations n is odd, the median M is the center observation in the ordered list. Find the location of the median by counting n+1 2 observations up from the bottom of the list. MATH 214 (NOTES) p. 15/3
Measuring center: the median (cont.) If the number of observations n is even, the median M is the mean of the two center observations in the ordered list. Find the location of the median by counting n+1 2 observations up from the bottom of the list. MATH 214 (NOTES) p. 16/3
The quartiles Q 1 and Q 3 To calculate the quartiles: Arrange the observations in increasing order and locate the median M in the ordered list of observations. The first quartile Q 1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median The third quartile Q 3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median MATH 214 (NOTES) p. 17/3
Side-by-side Boxplots Example. Here are the numbers of home runs that Babe Ruth hit in his 15 years with the New York Yankees, 1920 to 1934: 54 59 35 41 46 25 47 60 54 46 49 46 41 34 22 Another home run hitter is Mark McGwire, who retired after the 2001 season. Here are McGwire s home run counts for 1987 to 2001: 49 32 33 39 22 42 9 9 39 52 58 70 65 32 29 Find the five-number summaries and make side-by-side boxplots to compare these two home run hitters. What do your plots show? MATH 214 (NOTES) p. 18/3
Measures of association between 2 variables Covariance (sample covariance) You can compute the covariance, S XY using the following formula: (3) S XY = n i=1 x iy i n 1 n xȳ n 1 MATH 214 (NOTES) p. 19/3
Probability: Colors of M & M s If you draw an M & M candy at random from a bag of the candies, the candy you draw will have one of the seven colors. The probability of drawing each color depends on the proportion of each color among all candies made. Here is the distribution for milk chocolate M & M s: Color Purple Yellow Red Probability 0.2 0.2 0.2 Color Orange Brown Green Blue Probability 0.1 0.1 0.1? MATH 214 (NOTES) p. 20/3
Colors of M & M s (cont.) a) What must be the probability of drawing a blue candy? b) What is the probability that you do not draw a brown candy? c) What is the probability that the candy you draw is either yellow, orange, or red? MATH 214 (NOTES) p. 21/3
Conditional probability Problem. Josh and Al are avid tennis players and they enjoy playing matches against each other. They do, however, have one difference of opinion on the court. Al likes to have a nice long warm-up session at the start where they hit the ball back and forth and back and forth. Josh s ideal warm-up is to bend at the waist to tie his sneakers and to adjust his shorts. Al thinks that when they rush through the warm-up, he doesn t play as well. MATH 214 (NOTES) p. 22/3
Conditional probability (cont.) The following table shows the outcomes of their last 20 matches, along with the type of warm-up before they started keeping score. Does the type of warm-up have an influence on the outcome of a match? Warm-up time Al wins Josh wins Total Less than 10 min. 4 9 13 10 min. or more 5 2 7 Total 9 11 20 MATH 214 (NOTES) p. 23/3
CHAPTER 7 SAMPLING DISTRIBUTIONS MATH 214 (NOTES) p. 24/3
Example A couple plans to have three children. There are 8 possible arrangements of girls and boys. For example, GGB means the first two children are girls and the third child is a boy. All 8 arrangements are (approximately) equally likely. a) Write down all 8 arrangements of the sexes of three children. What is the probability of any one of these arrangements? MATH 214 (NOTES) p. 25/3
Example (cont.) b) Let X be the number of girls the couple has. What is the probability that X = 2? c) Starting from your work in a), find the distribution of X. That is, what values can X take, and what are the probabilities for each value? MATH 214 (NOTES) p. 26/3
Problem We are interested in estimating the average number of cars per household in a little town call Statstown. Let X represent the number of cars in a house picked at random. God knows that X has a Binomial distribution with n = 4 and p = 0.5. Suppose that we can only afford a sample of size 4 and that we are going to use this sample to estimate that population average. MATH 214 (NOTES) p. 27/3
Problem (cont.) What we are going to do next is called a simulation. First, we will draw a lot of random samples coming from a Binomial Distribution with n = 4 and p = 0.5. Then we will make a histogram for all the x s corresponding to our samples. We are going to do this do see what the histogram of x looks like. This will give us an idea of what to expect in a similar situation. MATH 214 (NOTES) p. 28/3
Central Limit Theorem Draw a random sample of size n from any population with mean µ and finite standard deviation σ. When n is large, the sampling distribution of the sample mean x is approximately Normal: (4) x is approximately N(µ, σ n ) MATH 214 (NOTES) p. 29/3
Example The number of accidents per week at a hazardous intersection varies with mean 2.2 and standard deviation 1.4. This distribution takes only whole-number values, so it is certainly not Normal. a) Let x be the mean number of accidents per week at the intersection during a year (52 weeks). What is the approximate distribution of x according to the central limit theorem? MATH 214 (NOTES) p. 30/3
Example (cont.) b) What is the approximate probability that x is less than 2? c) What is the approximate probability that there are fewer than 100 accidents at the intersection in a year? (Hint: Restate this event in terms of x) MATH 214 (NOTES) p. 31/3
CHAPTER 9 HYPOTHESIS TESTS MATH 214 (NOTES) p. 32/3
Do you want to become a millionaire? Let s say that one of you is invited to this popular show. As you probably know, you have to answer a series of multiple choice questions and there are four possible answers to each question. Perhaps you also have seen that if you don t know the answer to a question you could either "jump the question" or you could "ask the audience". Suppose that you run into a question for which you don t know the answer with certainty and you decide to "ask the audience". Let s say that you initially believe that the right answer is A. Then you ask the audience and only 2% of the audience shares your opinion. What would you do? Change your initial belief or reject it? MATH 214 (NOTES) p. 33/3
TO BE CONTINUED... MATH 214 (NOTES) p. 34/3