1 UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002 Exercises Unit 2 Descriptive Statistics Tables and Graphs Due: Monday September 23, 2002 READINGS 1. Text (Daniel, Biostatistics) Chapter 1 Sections 1-3, 5, and 6 2. Text (Daniel, Biostatistics) Chapter 2 3. (Optional Distributed in class) Cleveland WS and McGill R. (1985) Graphical perception and graphical methods for analyzing scientific data. Science. 229: 828-833. 4. (Optional Distributed in class) Kolata G. (1984) The proper display of data. Science. 226: 156-157. EXERCISES: 1. For each of the following variables indicate whether it is quantitative or qualitative and specify the measurement scale that is employed when taking measurements on each: (source: Daniel, page 12, problem #6.) a) Class standing of members of this class relative to each other b) Admitting diagnosis of patients admitted to a mental health clinic c) Weights of babies born in a hospital during a year d) Gender of babies born in a hospital during a year e) Range of motion of elbow joint of students enrolled in a university health sciences curriculum f) Under-arm temperature of day-old infants born in a hospital
2 2. Using the data below (source: Daniel, 6 th edition page 30, problem 2.3.5), 7 10 12 4 8 7 3 8 5 12 11 3 8 1 1 13 10 4 4 5 5 8 7 7 3 2 3 8 13 1 7 17 3 4 5 5 3 1 17 10 4 7 7 11 8 a. Construct a stem and leaf display. b. Construct a frequency table with columns for frequency, relative frequency, cumulative frequency, and cumulative relative frequency. c. Construct a histogram. d. Construct a frequency polygon 3. Data were recorded on the age in years and height in cm of 20 high school students in a classroom. Females Males Age Height Age Height 15 170 15 185 15 154 16 183 16 160 16 174 15 159 15 183 15 156 15 173 15 153 15 173 16 166 15 178 16 163 14 167 15 167 15 177 15 151 16 171
3 a. Create a frequency table for age, with columns for frequency, relative frequency, cumulative frequency, and cumulative relative frequency. b. Create a histogram for age. c. For each sex, create a stem-and-leaf display for height. What does a comparison of the displays suggest about the students? d. For each sex, create histograms for height using the same scale.
4 SOLUTIONS #1. a. Qualitative - ordinal b. Qualitative - nominal c. Quantitative - ratio d. Qualitative - nominal e. Quantitative - ratio f. Quantitative - interval #2. How to Create a STATISTIX Data Set Containing This Information STEP 1. Enter the data. 1) Begin the software program STATISTIX 2) Click 1x on DATA. 3) Click 1x on INSERT 4) Click 1x on VARIABLES. 5) Click 1x inside the NEW VARIABLE NAMES dialog box. Type X <enter>. 6) Click 1x on OK At this point you will see a spread sheet with space for only one record. You ll need to instruct STATISTIX that you actually have a data set of 45 records 7) Click 1x on DATA 8) Click 1x on INSERT 9) Click 1x on CASES 10) Click 1x inside the FIRST NEW CASE NUMBER dialog box Type 2 <enter> 11) Click 1x inside the NUMBER OF CASES TO INSERT dialog box. Type 44 <enter> 7) To enter your data by rows: Type in the first value. Press the RIGHT arrow. Type in the second value. Press <enter>. All other values are entered the same as the second value. To enter your data by columns: Type in the first value. Press the DOWN arrow. Type in the second value. Press <enter>.
5 All other values are entered the same as the second value. Continue until all values of one column are entered. Use the arrow keys to get to the first record in the next column and proceed like with the first column. You should have 45 records with 1 values per record. If you have 46 records, click 1x on the record number 46. Click 1x on EDIT. Click 1x on CUT. STEP 2. Save the data. 1) Click 1x on FILE. 2) Click 1x on SAVE AS. 3) *.sx should be highlighted in the FILE NAME dialog box. Type unit23ex2.sx <enter>. #2a. Here is the stem and leaf diagram I constructed Stem Leaf 0 1 1 1 1 0 2 3 3 3 3 3 3 0 4 4 4 4 4 5 5 5 5 5 0 7 7 7 7 7 7 7 0 8 8 8 8 8 8 1 0 0 0 1 1 1 2 2 3 3 1 1 7 7 Other groupings for the stem are okay.
6 How to Request a Stem and Leaf Plot in STATISTIX Following assumes that you are already in STATISTIX and have already opened unit23ex2.sx 1) Click 1x on STATISTICS 2) Click 1x on SUMMARY STATISTICS. 3) Click 1x on STEM AND LEAF PLOT 4) Click 1x on the variable X Then click on the RIGHT ARROW 5) Click 1x on OK You should get the following STEM AND LEAF PLOT OF X LEAF DIGIT UNIT = 1 MINIMUM 1.0000 0 1 REPRESENTS 1. MEDIAN 7.0000 MAXIMUM 17.000 STEM LEAVES 4 0 1111 11 0 2333333 21 0 4444455555 (7) 0 7777777 17 0 888888 11 1 00011 6 1 2233 2 1 2 1 77 45 CASES INCLUDED 0 MISSING CASES To the reader: Can you guess what the numbers at the far left are telling you? Hint Read from top to bottom, then from bottom to top!
7 How To Insert Text STATISTIX Results into a WORD Document Following assumes that you are in STATISTIX and have just gotten the plot 1) Click 1x on FILE 2) Click 1x on SAVE AS. 3) (Check to be sure results are being saved to the directory of your choosing) 4) Type unit23ex2a.txt in the file name dialog box 5) (Minimize the STATISTIX window so that you can work in WORD) Next assumes that you are in WORD 6) Position cursor to where you want the results to be placed 7) Click 1x on INSERT 8) Click 1x on FILE 9) Using the LOOK IN feature, position yourself in the directory containing your results 10) Click 1x on unit23ex2a.txt
8 #2b. Here is what I constructed by hand Class Relative Cumulative Cumulative Interval Frequency Frequency Frequency Rel. Frequency 0-1 4.0899 4.0899 2-3 7.1556 11.2455 4-5 10.2222 21.4677 6-7 7.1556 28.6233 8-9 6.1333 34.7566 10-11 5.1111 39.8677 12-13 4.0899 43.9576 14-15 0 0 43.9576 16-17 2.0444 45 1.0000 TOTAL 45 1.0000 Other class intervals are okay.
9 How to Request a Frequency Distribution in STATISTIX Following assumes that you are already in STATISTIX and have already opened unit23ex2.sx AND that you have just finished your stem and leaf diagram 1) Click 1x on WINDOW 2) Click 1x on unit23ex2.sx. At this point, STATISTIX should have returned you to your spreadsheet of data 3) Click 1x on STATISTICS 3) Click 1x on SUMMARY STATISTICS 4) Click 1x on FREQUENCY DISTRIBUTION 4) Click 1x on the variable X Then click on the RIGHT ARROW 5) Click 1x on OK You should get the following FREQUENCY DISTRIBUTION OF X CUMULATIVE VALUE FREQ PERCENT FREQ PERCENT 1 4 8.9 4 8.9 2 1 2.2 5 11.1 3 6 13.3 11 24.4 4 5 11.1 16 35.6 5 5 11.1 21 46.7 7 7 15.6 28 62.2 8 6 13.3 34 75.6 10 3 6.7 37 82.2 11 2 4.4 39 86.7 12 2 4.4 41 91.1 13 2 4.4 43 95.6 17 2 4.4 45 100.0 TOTAL 45 100.0
10 #2c. How to Request a Histogram in STATISTIX Following assumes that you are already in STATISTIX and have already opened unit23ex2.sx AND that you have just finished some other description or analysis 1) Click 1x on WINDOW 2) Click 1x on unit23ex2.sx. 3) Click 1x on STATISTICS 4) Click 1x on SUMMARY STATISTICS 5) Click 1x on HISTOGRAM 6) Click 1x on the variable X Then click on the RIGHT ARROW 6) Click 1x on OK You should get the following (note on titles below ) Frequency 10 8 6 4 2 0 Histogram for #2c 0 2 4 6 8 10 12 14 16 18 20 X To get the titles as shown here 7) Click 1x on RESULTS 8) In the TOP TITLE dialog box, type Histogram for #2c
11 How To Insert GRAPHICAL STATISTIX Results into a WORD Document Following assumes that you are in STATISTIX and have just gotten the plot 1) Click 1x on FILE 2) Click 1x on SAVE AS. 4) (Check to be sure results are being saved to the directory of your choosing) 4) Type unit23ex2c.emf in the file name dialog box 5) (Minimize the STATISTIX window so that you can work in WORD) Next assumes that you are in WORD 6) Position cursor to where you want the results to be placed 7) Click 1x on INSERT 8) Click 1x on PICTURE 9) Click 1x on FROM FILE 10) Using the LOOK IN feature, position yourself in the directory containing your results 11) Click 1x on unit23ex2c.emf
12 #2d. How to Request a Cumulative Distribution in STATISTIX Following assumes that you are already in STATISTIX and have already opened unit23ex2.sx AND that you have just finished some other description or analysis 1) Click 1x on WINDOW 2) Click 1x on unit23ex2.sx. 3) Click 1x on STATISTICS 4) Click 1x on SUMMARY STATISTICS 5) Click 1x on HISTOGRAM 6) Click 1x on the variable X Then click on the RIGHT ARROW 7) CLICK 1x on the CUMULATIVE DISTRIBUTION option 8) Click 1x on OK You should get the following (except that I changed the title a bit ) 100 Cumulative Distribution 80 Percent 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 X
13 #3a. Relative Cumulative Cumulative Age Frequency Frequency Frequency Rel. Frequency 14 1.05 1.05 15 13.65 14.70 16 6.30 20 1.00 TOTAL 20 1.00 #3b. Histogram of Age Frequency 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 14 15 16 Age, years
14 #3c. Here is what I get by hand (I like it better than the STATISTIX output below) Females Stem Males 1 3 4 15 6 9 15 0 3 16 6 7 16 7 0 1 17 4 3 3 17 8 7 18 3 3 18 5 How To Supply Labels for Values of Discrete Variables (so that output is easier to read!) Following assumes that you are in STATISTIX and you wish to assign labels to the values 0 and 1 for the variable SEX so that 0=FEMALE and 1=MALE 1) Click 1x on WINDOW 2) Click 1x on unit23ex3 spreadsheet. 3) Click 1x on DATA 4) Click 1x on LABELS 5) Click 1x on VALUE LABELS 6) Click 1x on the variable SEX and RIGHT ARROW it to the SOURCE VARIABLE box 7) In the VALUE box type 0. Below this, in the LABEL box type 0=Female 8) In the VALUE box type 1. Below this, in the LABEL box, type 1=Male. 9) Click 1x on SAVE. 10) Click 1x on CLOSE.
15 This is what I get in STATISTIX STEM AND LEAF PLOT OF HEIGHT FOR SEX = 0=female LEAF DIGIT UNIT = 1 MINIMUM 151.00 15 1 REPRESENTS 151. MEDIAN 160.00 MAXIMUM 171.00 STEM LEAVES 3 15 134 5 15 69 (2) 16 03 4 16 67 2 17 01 11 CASES INCLUDED 0 MISSING CASES STEM AND LEAF PLOT OF HEIGHT FOR SEX = 1=male LEAF DIGIT UNIT = 1 MINIMUM 167.00 16 7 REPRESENTS 167. MEDIAN 177.00 MAXIMUM 185.00 STEM LEAVES 1 16 7 4 17 334 (2) 17 78 3 18 33 1 18 5 9 CASES INCLUDED 0 MISSING CASES Males tend to be taller than females #3d. Class FEMALES MALES Interval Freq. Re. Freq. Freq. Rel. Freq. 150-159 5.45 0 0 160-169 4.36 1.11 170-179 2.18 5.56 180-189 0 0 3.33
16 How To Select A Subset of the Data for Analysis We wish to construct separate histograms, first for females, then for males. STATISTIX is a bit awkward in this. To Select Females, STATISTIX requires that you omit Males 1) Click 1x on WINDOW 2) Click 1x on unit23ex3 spreadsheet. 3) Click 1x on DATA 4) Click 1x on OMIT/SELECT/RESTORE CASES 5) Type Omit sex=1 6) Click 1x on GO 7) Click 1x on CLOSE Construct your histogram using instructions per above. Take care to title it clearly. Before you can select males, you must restore the entire data set 1) Click 1x on WINDOW 2) Click 1x on unit23ex3 spreadsheet 3) Click 1x on OMIT/SELECT/RESTORE CASES 4) Type restore To Select Males, STATISTIX requires that you omit Females 1) Type Omit sex=0 2) Click 1x on GO 3) Click 1x on CLOSE
17 This is what I get in STATISTIX Frequency Frequency 5 4 3 2 1 0 5 4 3 2 1 0 Histogram of Heights - FEMALES 150 160 170 180 190 HEIGHT Histogram of Heights - MALES 150 160 170 180 190 HEIGHT