MID-TERM EXAMINATION IN DATA MODELS AND DECISION MAKING 22:960:575 Instructions: Fall 2017 1. Complete and submit by email to TA and cc me, your answers by 11:00 PM today. 2. Provide a single Excel workbook with 5 Sheets, one sheet per problem. 3. Solve all problems. 4. Good Luck! Problem 1. Ellen Smith has collected the following data on the amount of time (in minutes) taken by a tax preparation service to complete client interviews: Client Number Time to Complete Interview (minutes) 1 8.0 2 12.0 3 26.0 4 10.0 5 23.0 6 21.0 7 16.0 8 22.0 9 18.0 10 17.0 11 36.0 12 9.0 You may assume that this data are a representative sample of all client interview times. (a) Compute the sample mean and the sample standard deviation of this sample.
(b) Construct a 99% confidence interval for the population mean interview time with β= 99% (c) Approximately how large a sample size is required to obtain a 99% confidence interval whose accuracy is +/- 8.0 minutes? Problem 2. AXY Marketing has been gathering data on people s television viewing habits in smaller metropolitan areas. Ely Nanda, an analyst at AXY, is trying to predict the number of households that tune in to a given television station at any time during a given calendar week. She has gathered data for 25 different stations/broadcast areas, and has run a simple linear regression model, where the number of households that tune in to a station (in 10,000s) sometime during the week is the dependent variable. The independent variable that she has used is the number of households (in 10,000s) with televisions in the broadcast area. The resulting regression model output appears below. Ely has looked at the output and is discouraged with the results: (a) Based on the above regression output, provide at least one reason why this regression might not be a good model. Ely has decided to give her factors some more thought, and has come upon the idea that the number of households who tune in to a particular station during the week might also depend on whether or not the station s channel is VHF or UHF. For example, most VHF stations are major networks (like ABC, CBS, or NBC), which are viewed more often regardless of the size of the broadcast area. Ely therefore has included a dummy variable for whether a station broadcasts on VHF (VHF = 1, UHF = 0).
The results of her multiple linear regression are as follows: (b) Write a complete equation for the multiple linear regression model that incorporates the estimated coefficients provided by the second regression model output. Make sure to define in words all the variables used in the equation. Do the signs of the regression coefficients make sense? Hint: define variables: Y= # of Households (10,000s), X1=number of Households (10,000s) in the broadcast area, X2=1 if the station broadcasts on VHF, 0 otherwise Problem 3. A medical test for malaria is subject to some error. Given a person who has malaria, the probability that the test will fail to reveal the malaria is 0.06. Given a person who does not have malaria, the test will correctly identify that the person does not have malaria with probability 0.91. In a particular area, 20% of the population suffers from malaria. (a) If someone has malaria, what is the probability that the test will identify that person as having malaria?
(b) Copy the following joint probability table to your answer xls and fill the missing numbers. Has malaria Does not have malaria Total Test indicates malaria 0.188 Test indicates no malaria Total (c) Suppose that Richard Rice, a resident of the area, decides to take the test for malaria. If his test results indicate that he has malaria, what is the probability that he actually has malaria? (d) Suppose three unrelated individuals who are not infected with malaria take the test. What is the probability that at least one of the three individuals will be identified by the test as having malaria?
Problem 4. The YUMM cereal company distributes Colored Sugar Cereal. Each box is supposed to contain 450 grams of cereal. They also sell cereal in a 2-pack, where each 2-pack contains 2 boxes of cereal. The 2-packs are supposed to have a total weight of 900 grams. YUMM can choose µ, the actual mean amount of cereal to put in each of the boxes, but their filling process has some inaccuracies. Regardless of the value of µ that they select (typically between 450 grams and 500 grams), the amount of cereal placed in the box by their filling process is Normally distributed with mean µ and standard deviation 10 grams. (The mean µ is the same for all of the boxes.) Since each box is poured by the same machine that has been calibrated to the chosen value of µ, the correlation between the weights of any two boxes is CORR=0.63. (a) Suppose that YUMM selects µ = 470 grams. What is the probability that any given box is under the 450 grams that the box is supposed to weigh? (b) Suppose that YUMM selects µ = 460. What is the expected total weight of the 2 boxes in a given 2-pack? What are the variance and the standard deviation of the total weight of the given 2-pack? What is the distribution of the total weight? (c) Suppose that µ = 460 grams. What is the probability that the total weight of a given 2-pack is less than 900 grams? (d) At what value should YUMM set µ so that the probability is 0.95 that the weight in any given single box is at least 450 grams?
Problem 5. Four teams of workers are available to do 4 jobs. The cost required for each team to do each job is given in the table below. We want to assign the teams to do the jobs at minimum cost. a) Write the problem in the form of an integer programming problem. b) Solve with Excel the integer programming problem.