Frequencies. Chapter 2. Descriptive statistics and charts

Similar documents
Chapter 5. Describing Distributions Numerically. Finding the Center: The Median. Spread: Home on the Range. Finding the Center: The Median (cont.

What is Statistics? 13.1 What is Statistics? Statistics

Algebra I Module 2 Lessons 1 19

MATH& 146 Lesson 11. Section 1.6 Categorical Data

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Distribution of Data and the Empirical Rule

Homework Packet Week #5 All problems with answers or work are examples.

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

Chapter 3. Averages and Variation

Dot Plots and Distributions

Statistics for Engineers

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Chapter 1 Midterm Review

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

Box Plots. So that I can: look at large amount of data in condensed form.

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Section 5.2: Organizing and Graphing Categorical

Measuring Variability for Skewed Distributions

Objective: Write on the goal/objective sheet and give a before class rating. Determine the types of graphs appropriate for specific data.

Introduction to IBM SPSS Statistics (v24)

E X P E R I M E N T 1

Blueline, Linefree, Accuracy Ratio, & Moving Absolute Mean Ratio Charts

abc Mark Scheme Statistics 3311 General Certificate of Secondary Education Higher Tier 2007 examination - June series

Comparing Distributions of Univariate Data

DV: Liking Cartoon Comedy

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Answers. Chapter 9 A Puzzle Time MUSSELS. 9.1 Practice A. Technology Connection. 9.1 Start Thinking! 9.1 Warm Up. 9.1 Start Thinking!

MA 15910, Lesson 5, Algebra part of text, Sections 2.3, 2.4, and 7.5 Solving Applied Problems

Histograms and Frequency Polygons are statistical graphs used to illustrate frequency distributions.

Statistics: A Gentle Introduction (3 rd ed.): Test Bank. 1. Perhaps the oldest presentation in history of descriptive statistics was

Chapter 6. Normal Distributions

Graphical Displays of Univariate Data

Normalization Methods for Two-Color Microarray Data

NETFLIX MOVIE RATING ANALYSIS

COMP Test on Psychology 320 Check on Mastery of Prerequisites

For these exercises, use SAS data sets stored in a permanent SAS data library.

1.1 Common Graphs and Data Plots

Calculated Percentage = Number of color specific M&M s x 100% Total Number of M&M s (from the same row)

The One Penny Whiteboard

User Guide. S-Curve Tool

Chapter 2 Describing Data: Frequency Tables, Frequency Distributions, and

Table of Contents. Introduction...v. About the CD-ROM...vi. Standards Correlations... vii. Ratios and Proportional Relationships...

SEVENTH GRADE. Revised June Billings Public Schools Correlation and Pacing Guide Math - McDougal Littell Middle School Math 2004

Estimation of inter-rater reliability

download instant at

Chapter 2 Notes.notebook. June 21, : Random Samples

1. MORTALITY AT ADVANCED AGES IN SPAIN MARIA DELS ÀNGELS FELIPE CHECA 1 COL LEGI D ACTUARIS DE CATALUNYA

What can you tell about these films from this box plot? Could you work out the genre of these films?

Multiple-point simulation of multiple categories Part 1. Testing against multiple truncation of a Gaussian field

Full file at

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach

MIS 0855 Data Science (Section 005) Fall 2016 In-Class Exercise (Week 6) Advanced Data Visualization with Tableau

Applications of Mathematics

Visual Encoding Design

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Relationships Between Quantitative Variables

6 th Grade Semester 2 Review 1) It cost me $18 to make a lamp, but I m selling it for $45. What was the percent of increase in price?

Copyright 2013 Pearson Education, Inc.

T HE M AGIC OF G RAPHS AND S TATISTICS

EXPLORING DISTRIBUTIONS

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/11

9.2 Data Distributions and Outliers

Relationships. Between Quantitative Variables. Chapter 5. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Bridges and Arches. Authors: André Holleman (Bonhoeffer college, teacher in research at the AMSTEL Institute) André Heck (AMSTEL Institute)

NAA ENHANCING THE QUALITY OF MARKING PROJECT: THE EFFECT OF SAMPLE SIZE ON INCREASED PRECISION IN DETECTING ERRANT MARKING

Navigate to the Journal Profile page

Notes Unit 8: Dot Plots and Histograms

LAB 1: Plotting a GM Plateau and Introduction to Statistical Distribution. A. Plotting a GM Plateau. This lab will have two sections, A and B.

Sampling Plans. Sampling Plan - Variable Physical Unit Sample. Sampling Application. Sampling Approach. Universe and Frame Information

in the Howard County Public School System and Rocketship Education

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

6 ~ata-ink Maximization and Graphical Design

PICK THE RIGHT TEAM AND MAKE A BLOCKBUSTER A SOCIAL ANALYSIS THROUGH MOVIE HISTORY

On Your Own. Applications. Unit 2. ii. The following are the pairs of mutual friends: A-C, A-E, B-D, C-D, and D-E.

Libraries as Repositories of Popular Culture: Is Popular Culture Still Forgotten?

GCSE MARKING SCHEME AUTUMN 2017 GCSE MATHEMATICS NUMERACY UNIT 1 - INTERMEDIATE TIER 3310U30-1. WJEC CBAC Ltd.

Title page. Journal of Radioanalytical and Nuclear Chemistry. Names of the authors: Title: Affiliation(s) and address(es) of the author(s):

I. Model. Q29a. I love the options at my fingertips today, watching videos on my phone, texting, and streaming films. Main Effect X1: Gender

PGDBA 2017 INSTRUCTIONS FOR WRITTEN TEST

Western Statistics Teachers Conference 2000

Display Dilemma. Display Dilemma. 1 of 12. Copyright 2008, Exemplars, Inc. All rights reserved.

1 Boxer Billy Input File: BoxerBillyIn.txt

Collecting Data Name:

A comparison of inexpensive statistical packages for Apple II microcomputers

Reproducibility Assessment of Independent Component Analysis of Expression Ratios from DNA microarrays.

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition

Digital Logic Design: An Overview & Number Systems

TL-2900 AMMONIA & NITRATE ANALYZER DUAL CHANNEL

Human Hair Studies: II Scale Counts

ARCH 121 INTRODUCTION TO ARCHITECTURE I WEEK

Information and the Skewness of Music Sales

GBA 327: Module 7D AVP Transcript Title: The Monte Carlo Simulation Using Risk Solver. Title Slide

Version : 1.0: klm. General Certificate of Secondary Education November Higher Unit 1. Final. Mark Scheme

(Week 13) A05. Data Analysis Methods for CRM. Electronic Commerce Marketing

Resampling Statistics. Conventional Statistics. Resampling Statistics

Transcription:

An analyst usually does not concentrate on each individual data values but would like to have a whole picture of how the variables distributed. In this chapter, we will introduce some tools to tabulate the data, summarize the data. Moreover, graphs and charts present statistical data in visual form. 2.1. Tabulation of data Example 1: Frequency table Frequency table shows the frequency of all the possible responses. The distribution of variables of the data file survey.sav can be obtained by tabulating the data into a frequency table. Analyze Descriptive Statistics Frequencies Variables: age Frequencies Statistics Age N Valid Missing 40 0 variable Age Frequency table for the age Valid 12-19 20-29 30-39 40-49 Total Valid Cumulative Frequency Percent Percent Percent 3 7.5 7.5 7.5 18 45.0 45.0 52.5 16 40.0 40.0 92.5 3 7.5 7.5 100.0 40 100.0 100.0 Marjorie Chiu, 2009 2-1

Counting responses for combinations of variables Department of Applied Mathematics Cross tabulation is to form a table that contains counts of the number of times various combinations of values of two categorical variables occur. Analyze Descriptive Statistics Crosstabs Row: sex Column: age Percentage within row, within column and percentage of total also can be obtained by changing he options of "Cells". Crosstabs Case Processing Summary sex * Age Cases Valid Missing Total N Percent N Percent N Percent 40 100.0% 0.0% 40 100.0% Row variable Column variable sex * Age Crosstabulation sex Total Male Female Count % within sex % within Age % of Total Count % within sex % within Age % of Total Count % within sex % within Age % of Total Age 12-19 20-29 30-39 40-49 Total 12 3 15 80.0% 20.0% 100.0% 66.7% 18.8% 37.5% 30.0% 7.5% 37.5% 3 6 13 3 25 12.0% 24.0% 52.0% 12.0% 100.0% 100.0% 33.3% 81.3% 100.0% 62.5% 7.5% 15.0% 32.5% 7.5% 62.5% 3 18 16 3 40 7.5% 45.0% 40.0% 7.5% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 7.5% 45.0% 40.0% 7.5% 100.0% Row percentage sum to 100% across each row Column percentage sum to 100% across each column Marjorie Chiu, 2009 2-2

2.2. Summarize the data by means of central tendency and dispersion Department of Applied Mathematics When we work with numerical data, it seems apparent that in most set of data there is a tendency for the observed values to group themselves about some interior values; some central values seem to be the characteristics of the data. This phenomenon is referred to as central tendency. Arithmetic mean, median and the mode are the three commonly used measures of the central tendency. In addition, it is necessary to have some measures on how data are scattered. That is, we want to know what is the dispersion, or variability in a set of data. Range, deciles, percentile, fractile, quartiles, mean absolute deviation, variance, standard deviation and coefficient of variation are used to describe the dispersion of the data. Formulas: Mean x = Variance f x i f i i ( x x) f 2 i i s =, f 1 where f i is the frequency of the i th item, i 2 x i is the value of the i th item or class mark, x is the sample mean Standard deviation s= 2 s Categorical data (survey.sav) Analyze Descriptive Statistics Frequencies Variable: age Statistics: median, mode Remark: The frequency table is displayed also. Marjorie Chiu, 2009 2-3

Frequencies Statistics Age N Median Mode Valid Missing 40 0 2.00 2 Quantitative data Example 2: (rings) Lot-for-lot ordering is the simplest deterministic inventory model. In this model, items are purchased from a supplier (say, a wholesaler) in the exact amounts required for each time period. It is well suited for inventory items of high value or with a discontinuous demand. One hundred consecutive weekly purchases of diamond rings are made by a retail jeweler from a wholesaler to replenish the inventory sold to customers during the preceding week. The number of rings is shown below. 44 35 34 25 41 66 50 38 45 41 40 43 49 31 44 52 55 45 51 63 33 68 27 30 58 62 45 52 12 72 49 38 66 64 60 41 30 65 46 35 70 54 43 64 24 25 52 42 53 22 23 35 51 43 11 58 75 50 67 51 32 57 24 43 35 37 42 58 42 59 25 37 40 28 60 31 64 72 48 16 26 57 33 18 46 69 74 39 26 55 78 40 50 46 47 36 29 47 63 55 The data was saved in SPSS format with file name rings.sav. Open this file and summarize the number of rings sold. Analyze Descriptive Statistics Frequencies Variable: rings Statistics of the selected variable such as mean, median, model, standard deviation, variance, range, quartiles, etc can be evaluated and are used to describe the characteristic of data. Remark: Option of Charts can be changed to display a histogram. Marjorie Chiu, 2009 2-4

Frequencies RINGS N Mean Std. Error of Mean Median Mode Std. Deviation Variance Skewness Std. Error of Skewness Kurtosis Std. Error of Kurtosis Range Minimum Maximum Sum Percentiles Statistics Valid Missing 10 20 25 30 40 50 60 70 75 80 90 a. Multiple modes exist. The smallest value is shown 100 0 45.42 1.52 45.00 35 a 15.20 231.18.000.241 -.579.478 67 11 78 4542 25.00 31.20 35.00 37.00 41.40 45.00 49.60 53.70 57.00 59.80 66.00 Variable Summary of descriptive statistics Marjorie Chiu, 2009 2-5

Valid 11 12 16 18 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 57 58 59 60 62 63 64 65 66 67 68 69 70 72 74 75 78 Total RINGS Valid Cumulative Frequency Percent Percent Percent 1 1.0 1.0 1.0 1 1.0 1.0 2.0 1 1.0 1.0 3.0 1 1.0 1.0 4.0 1 1.0 1.0 5.0 1 1.0 1.0 6.0 2 2.0 2.0 8.0 3 3.0 3.0 11.0 2 2.0 2.0 13.0 1 1.0 1.0 14.0 1 1.0 1.0 15.0 1 1.0 1.0 16.0 2 2.0 2.0 18.0 2 2.0 2.0 20.0 1 1.0 1.0 21.0 2 2.0 2.0 23.0 1 1.0 1.0 24.0 4 4.0 4.0 28.0 1 1.0 1.0 29.0 2 2.0 2.0 31.0 2 2.0 2.0 33.0 1 1.0 1.0 34.0 3 3.0 3.0 37.0 3 3.0 3.0 40.0 3 3.0 3.0 43.0 4 4.0 4.0 47.0 2 2.0 2.0 49.0 3 3.0 3.0 52.0 3 3.0 3.0 55.0 2 2.0 2.0 57.0 1 1.0 1.0 58.0 2 2.0 2.0 60.0 3 3.0 3.0 63.0 3 3.0 3.0 66.0 3 3.0 3.0 69.0 1 1.0 1.0 70.0 1 1.0 1.0 71.0 3 3.0 3.0 74.0 2 2.0 2.0 76.0 3 3.0 3.0 79.0 1 1.0 1.0 80.0 2 2.0 2.0 82.0 1 1.0 1.0 83.0 2 2.0 2.0 85.0 3 3.0 3.0 88.0 1 1.0 1.0 89.0 2 2.0 2.0 91.0 1 1.0 1.0 92.0 1 1.0 1.0 93.0 1 1.0 1.0 94.0 1 1.0 1.0 95.0 2 2.0 2.0 97.0 1 1.0 1.0 98.0 1 1.0 1.0 99.0 1 1.0 1.0 100.0 100 100.0 100.0 Marjorie Chiu, 2009 2-6

2.3. Histogram (rings.sav) Department of Applied Mathematics A histogram composes a number of bars and is used to show the distribution of a variable, the skewness of the distribution can be observed. Each bar presents the frequency of a range of values that is directly proportional to the area of the bar. Graphs Legacy Dialogs Histogram Variable: rings Graph The distribution of a variable rings is quite symmetric. 2.4. Stem and leaf display (rings.sav) Marjorie Chiu, 2009 2-7

Stem and leaf display shows the distribution of a variable like a histogram. Moreover, it depicts the actual value of the data points simultaneously. Analyze Descriptive Statistics Explore Dependent List: rings Remark: Descriptive statistics and histogram also can be obtained. RINGS Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 1. 12 2.00 1. 68 4.00 2. 2344 8.00 2. 55566789 8.00 3. 00112334 10.00 3. 5555677889 15.00 4. 000111222333344 11.00 4. 55566677899 11.00 5. 00011122234 9.00 5. 555778889 8.00 6. 00233444 6.00 6. 566789 4.00 7. 0224 2.00 7. 58 Stem width: 10 Each leaf: 1 case(s) The value of the stem and each individual digits in the leaf compose a data value according to the stem width. For example, the first row of the stem-and-leaf display consists of two data with values 11 and 12. 2.5. Boxplot (rings.sav) Boxplot helps us to visualize the distribution of a variable. It simultaneously displays the median, the interquartile range, and the smallest and largest values. Marjorie Chiu, 2009 2-8

100 80 Whiskers extend to largest and smallest observed values within 1.5 box lengths 60 40 Box extends from 25 th to 75 th percentile. 75 th percentile median 25 th percentile 20 0 N = 100 RINGS Variable 2.6. Bar chart (survey.sav) The number of cases in the category can be shown by the bar chart, in which the length (height) of the bar is directly proportional to the frequency. Analyze Descriptive Statistics Frequencies Variable: age Specify the charts as bar chart. Marjorie Chiu, 2009 2-9

20 Age 10 Frequency 0 12-19 20-29 30-39 40-49 Age Class category Variable / category axis Simple bar chart shows the frequency of different age groups OR Graphs OR Graphs Legacy Dialogs Bar Legacy Dialogs Interactive Summaries for groups of cases Define simple bar chart Bar Category axis: age Bars represent: of cases Drag the variable "age" to the x-axis. The options can be adjusted if necessary, for example, including empty categories. Marjorie Chiu, 2009 2-10

Interactive Graph (Simple bar chart) Department of Applied Mathematics Note: Interactive bar chart also shows the classes where there is no occurrence. Multiple bar chart is particularly useful if one desires to make quick comparison between different sets of data. Graphs Legacy Dialogs Bar Summaries for groups of cases Define clustered bar chart Bars represent: of cases Category axis: age Define clusters by: sex Marjorie Chiu, 2009 2-11

Graph (Multiple bar chart) Compose of 2 clusters Legend Category axis Component bar chart shows how different components making up the total using distinctive shadings or colours. Graphs Legacy Dialogs Bar Summaries for groups of cases Define stacked bar chart Category axis: age Define stacks by: sex Marjorie Chiu, 2009 2-12

Graph (Component bar chart) Compose of 2 clusters Category axis 2.7. Pie chart Pie charts are widely used to show the component parts of a total. They are popular because of their simplicity. In constructing a pie chart, the angles of a slice from the center must be in proportion with the percentage of the total. Analyze Descriptive Statistics Frequencies Variable: age Charts: pie chart Marjorie Chiu, 2009 2-13

OR Graphs OR Graphs Legacy Dialogs Pie Legacy Dialogs Interactive Data in chart are: Summaries for groups of cases Pie Slices represent: of cases Define slices by: age Simple Slice by: age (in style) Options: including empty categories Pies: Slice Labels (count, percent) Marjorie Chiu, 2009 2-14

Interactive Graph 2.8. Scatter plot A two-dimensional scatter plot shows a general picture of how the two quantitative variables relate to each other. Example 3: (car) An equation is to be developed from which we can predict the gasoline mileage of an automobile based on its weight and the temperature at the time of operation. The ASCII data are available in file car.dat. The three columns represent miles per gallon (miles; column 1-4; 1 d.p.), weight in tons (weight; column 6-9; 2 d.p.) and temperature (temperature; column 11-12) in Fahrenheit. Read in the ASCII data first and then save the file in SPSS format. Give a scatter plot of miles against temperature and then miles against weight. Graphs Legacy Dialogs Scatter/Dot Simple Scatter Y axis: miles X axis: temperature Then use the miles in y-axis and weight in the x-axis to produce another scatter plot. Marjorie Chiu, 2009 2-15

Graph 19.0 18.5 18.0 17.5 17.0 16.5 Miles per gallon 16.0 15.5 15.0 20 30 40 50 60 70 80 90 100 Temperature in Fahrenheit 19.0 18.5 18.0 17.5 17.0 16.5 Miles per gallon 16.0 15.5 15.0 1.0 1.2 1.4 1.6 1.8 2.0 2.2 Weight in tons The gasoline mileage of an automobile seems do not relate to the temperature but the mileage and the weight appears to have negative association. Marjorie Chiu, 2009 2-16

2.9. Time series plot Department of Applied Mathematics Time series plot is usually to show the variation of data as time advanced. Example 4: (miles.sav) Plot the time series data of the aircraft miles by the ABC airlines from 1986 to 1990. Analyze Forecasting Sequence Charts Variable: miles Time axis labels: date Sequence Plot Model Description Model Name MOD_1 Series or Sequence 1 miles Transformation None Non-Seasonal Differencing 0 Seasonal Differencing 0 Length of Seasonal Period 4 Horizontal Axis Labels Intervention Onsets Reference Lines Area Below the Curve Date_ None None Not filled Applying the model specifications from MOD_1 Case Processing Summary miles Series or Sequence Length 20 Number of Missing Values in User-Missing 0 the Plot System-Missing 0 Marjorie Chiu, 2009 2-17

Variable Time axis label The time series plot shows an upward trend with seasonal variation. Marjorie Chiu, 2009 2-18

Exercise 2 Question 1. Use the popular car data 1993 to construct a cross tabulation of the number of cars by car type and cylinder number. Calculate also the cell percentages within subgroups and of overall total. Question 2. Give a time series plot for the cod catch data. Briefly describe the plotting. Question 3. (hotel) A hotel is concerned about the number of people who book rooms by telephone but do not actually turn up. Over the past few weeks it has kept records of the number of people who do this, as shown below. How can these data be summarized? Describe its distribution briefly. Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 No-shows 4 5 2 3 3 2 1 4 7 2 0 3 1 4 5 Day 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 No-shows 2 6 2 3 3 4 2 5 5 2 4 3 3 1 4 Day 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 No-shows 5 3 6 4 3 1 4 5 6 3 3 2 4 3 4 Marjorie Chiu, 2009 2-19