MATH& 146 Lesson 11. Section 1.6 Categorical Data

Similar documents
Homework Packet Week #5 All problems with answers or work are examples.

Section 5.2: Organizing and Graphing Categorical

Objective: Write on the goal/objective sheet and give a before class rating. Determine the types of graphs appropriate for specific data.

Frequencies. Chapter 2. Descriptive statistics and charts

Algebra I Module 2 Lessons 1 19

What is Statistics? 13.1 What is Statistics? Statistics

Math 81 Graphing. Cartesian Coordinate System Plotting Ordered Pairs (x, y) (x is horizontal, y is vertical) center is (0,0) Quadrants:

Table of Contents. Introduction...v. About the CD-ROM...vi. Standards Correlations... vii. Ratios and Proportional Relationships...

Chapter 4. Displaying Quantitative Data. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Histograms and Frequency Polygons are statistical graphs used to illustrate frequency distributions.

download instant at

For these exercises, use SAS data sets stored in a permanent SAS data library.

MATH 214 (NOTES) Math 214 Al Nosedal. Department of Mathematics Indiana University of Pennsylvania. MATH 214 (NOTES) p. 1/3

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2002

JUST CHECKING. AP 6.2 and 6.3 HW

Copyright 2013 Pearson Education, Inc.

Practice Test. 2. What is the probability of rolling an even number on a number cube? a. 1 6 b. 2 6 c. 1 2 d. 5 be written as a decimal? 3.

GRADE. Nevada DEPARTMENT OF EDUCATION. Instructional Materials. Nevada TEST. Grade 7 MATHEMATICS. Copyright 2013 by the Nevada Department of Education

Get a Hint! Watch a Video. Save & Exit. The results from a survey of workers in a factory who work overtime on weekends are shown below.

Chapter 1 Midterm Review

AP Statistics Sec 5.1: An Exercise in Sampling: The Corn Field

Graphical User Interface for Modifying Structables and their Mosaic Plots

Lesson 5: Events and Venn Diagrams

Bite Size Brownies. Designed by: Jonathan Thompson George Mason University, COMPLETE Math

Statistics for Engineers

Jumpstarters for Math

Why visualize data? Advanced GDA and Software: Multivariate approaches, Interactive Graphics, Mondrian, iplots and R. German Bundestagswahl 2005

Distribution of Data and the Empirical Rule

Statistics: A Gentle Introduction (3 rd ed.): Test Bank. 1. Perhaps the oldest presentation in history of descriptive statistics was

The One Penny Whiteboard

On Your Own. Applications. Unit 2. ii. The following are the pairs of mutual friends: A-C, A-E, B-D, C-D, and D-E.

SPEED DRILL WARM-UP ACTIVITY

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Most Canadians think the Prime Minister s trip to India was not a success

Full file at

Chapter 7 Probability

Signal Survey Summary. submitted by Nanos to Signal Leadership Communication Inc., July 2018 (Submission )

6 th Grade Semester 2 Review 1) It cost me $18 to make a lamp, but I m selling it for $45. What was the percent of increase in price?

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

McRuffy Press Fourth Grade Color Math Test 7

Box Plots. So that I can: look at large amount of data in condensed form.

Preferred Ottawa Public Library hours of operation GenPop Survey Summary Document 3

AP Statistics Sampling. Sampling Exercise (adapted from a document from the NCSSM Leadership Institute, July 2000).

Answers. Chapter 9 A Puzzle Time MUSSELS. 9.1 Practice A. Technology Connection. 9.1 Start Thinking! 9.1 Warm Up. 9.1 Start Thinking!

Impressions of Canadians on social media platforms and their impact on the news

3/31/2014. BW: Four Minute Gridding Activity. Use a pencil! Use box # s on your grid paper.

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

North Carolina Standard Course of Study - Mathematics

Draft last edited May 13, 2013 by Belinda Robertson

Tech Paper. HMI Display Readability During Sinusoidal Vibration

Tech Essentials Final Part A (Use the Scantron to record your answers) 1. What are the margins for an MLA report? a. All margins are 1 b. Top 2.

Year Group 6. Maths. Comprehension. Teacher s Guide

Canadians opinions on our connection to the monarchy

TechNote: MuraTool CA: 1 2/9/00. Figure 1: High contrast fringe ring mura on a microdisplay

Math 7 /Unit 07 Practice Test: Collecting, Displaying and Analyzing Data

Key Maths Facts to Memorise Question and Answer

Dot Plots and Distributions

Graphical Displays of Univariate Data

CALIFORNIA STANDARDS TEST CSM00433 CSM01958 A B C CSM02216 A 583,000

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

EOC FINAL REVIEW Name Due Date

1600 F Street, Napa, California (fax) END OF EIGHTH GRADE ASSESSMENT CUMULATIVE

Diversity Report 2017

2012, the Author. This is the final version of a paper published in Participations: Journal of Audience and Reception Studios.

Barry County 4-H. Name: Address: 4-H Club: 4-H Leader: 4-H Age: Years in 4-H Llama/Alpaca Project:

Rounding Foldable Download or Read Online ebook rounding foldable in PDF Format From The Best User Guide Database

australian multi-screen report QUARTER 2, 2012 trends in video viewership beyond conventional television sets

Chapter 2 Notes.notebook. June 21, : Random Samples

Cancer in females. Visual Display of (Public Health) Data - Theory and Practice. Michael C. Samuel, Dr. P.H. Senior Epidemiologist / Data Scientist

Northern Dakota County Cable Communications Commission ~

Proceedings of the Third International DERIVE/TI-92 Conference

Analyzing Numerical Data: Using Ratios I.B Student Activity Sheet 4: Ratios in the Media

Adults say the music industry is one of the most changed industries, second only to the technology industry.

Chapter 5 Printing with Calc

Notes Unit 8: Dot Plots and Histograms

SECTION I. THE MODEL. Discriminant Analysis Presentation~ REVISION Marcy Saxton and Jenn Stoneking DF1 DF2 DF3

Sampling Worksheet: Rolling Down the River

16B CSS LAYOUT WITH GRID

SEVENTH GRADE. Revised June Billings Public Schools Correlation and Pacing Guide Math - McDougal Littell Middle School Math 2004

Chapter 3 Answers. Problem of the Week p a)

Mobile Math Teachers Circle The Return of the iclicker

(1) + 1(0.1) + 7(0.001)

Version : 1.0: klm. General Certificate of Secondary Education November Higher Unit 1. Final. Mark Scheme

STAYING INFORMED ACROSS THE GARDEN STATE WHERE DO YOU GO AND WHAT DO YOU KNOW?

Measuring Variability for Skewed Distributions

Module 1. Ratios and Proportional Relationships Lessons 11 14

The Structural Characteristics of the Japanese Paperback Book Series Shinsho

Chapter 2 Describing Data: Frequency Tables, Frequency Distributions, and

Normalization Methods for Two-Color Microarray Data

Section 001. Read this before starting!

Grade 7/8 Math Circles November 27 & 28 & Symmetry and Music

Graphic standards for the Electric Circuit logo

Processes for the Intersection

E X P E R I M E N T 1

Chapter 2 Random Number Generator

N12/5/MATSD/SP2/ENG/TZ0/XX. mathematical STUDIES. Wednesday 7 November 2012 (morning) 1 hour 30 minutes. instructions to candidates

The Relationship Between Movie Theatre Attendance and Streaming Behavior. Survey insights. April 24, 2018

FICTIONAL HEROES & VILLAINS HALLOF FAME

Transcription:

MATH& 146 Lesson 11 Section 1.6 Categorical Data 1

Frequency The first step to organizing categorical data is to count the number of data values there are in each category of interest. We can organize these counts (or frequencies) into a frequency table, which records the totals and the category names (called levels). 2

Frequency A class with 20 students had the following distribution of grades: A, A, A, B, B, B, B, B, C, C, C, D, D, D, D, D, D, F, F, F GRADE FREQUENCY A 3 B 5 C 3 D 6 F 3 3

Relative Frequency A relative frequency is the proportion of times a category occurs. Relative frequencies can be written as fractions, decimals, or percents. GRADE FREQUENCY RELATIVE FREQUENCY A 3 0.15 B 5 0.25 C 3 0.15 D 6 0.30 F 3 0.15 4

Cumulative Relative Frequency Cumulative relative frequency is the accumulation of the previous relative frequencies. GRADE FREQUENCY RELATIVE FREQUENCY CUMULATIVE RELATIVE FREQUENCY A 3 0.15 0.15 B 5 0.25 0.40 C 3 0.15 0.55 D 6 0.30 0.85 F 3 0.15 1.00 5

Example 1 Fifty part-time students were asked how many courses they were taking this term. The (incomplete) results are shown below: # of Courses Frequency Relative Frequency Cumulative Relative Frequency 1 30 0.6 2 15 3 a. Fill in the blanks in the table above. b. What percent of students take exactly two courses? c. What percent of students take at most two courses? 6

Graphs of Categorical Data There are two simple visual summaries that are used for categorical data Circle graphs (pie charts) show the amount of data that belong to each category as a proportional part of the whole. Bar graphs consist of bars that are separated from each other. The bars can be rectangles or they can be rectangular boxes and they can be vertical or horizontal. 7

Graphs of Categorical Data To get a better sense of graphing categorical data, consider the following table about the Titanic. The table lists the numbers and percentages in each class on the Titanic's voyage. CLASS FREQUENCY RELATIVE FREQUENCY First 325 15% Second 285 13% Third 706 32% Crew 885 40% Total 2201 100% 8

Pie Charts When you are interested in relative frequencies, a pie chart might be your display of choice. They slice the circle into pieces whose sizes are proportional to the fraction of the whole in each category. 9

10

Pie Charts There are two rules to follow when creating a pie chart: 1) The pieces have to add up to 100%. 2) No person can be represented in more than one piece. BAD PIE CHART 271% even without an Other category. 11

Bar Charts A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. Notice that each bar is separated from each other. 12

Pie Charts vs. Bar Charts While pie charts are well known, they are not typically as useful as other charts. It is generally more difficult to compare group sizes in a pie chart than in a bar chart, especially when categories have nearly identical counts or proportions. 13

Example 2 Which category is largest? Which is smallest? 14

The Titanic Here is part of a data matrix about the passengers and crew aboard the Titanic. Each case (row) of the data table represents a person on board the ship. Survived Age Sex Class Died Adult Male Third Survived Adult Male Crew Died Child Male Third Survived Child Female First Died Adult Male Third Died Adult Female Crew 15

The Titanic The problem with data matrices is that you can't see what's going on. And seeing is just what we want to do. We need ways to show the data so that we can see patterns, relationships, trends, and exceptions. Survived Age Sex Class Died Adult Male Third Survived Adult Male Crew Died Child Male Third Survived Child Female First Died Adult Male Third Died Adult Female Crew 16

Survival The Titanic To look at two categorical variables together, we often arrange the counts in a two-way table. Here is a two-way table of those aboard the Titanic, classified according to class of ticket and whether or not they survived. Class First Second Third Crew Total Survived 203 118 178 212 711 Died 122 167 528 673 1490 Total 325 285 706 885 2201 17

Survival The Titanic Because the table shows how the individuals are distributed along each variable, contingent on the value of the other variable, such a table is called a contingency table. Class First Second Third Crew Total Survived 203 118 178 212 711 Died 122 167 528 673 1490 Total 325 285 706 885 2201 18

Survival Contingency Tables Class First Second Third Crew Total Survived 203 118 178 212 711 Died 122 167 528 673 1490 Total 325 285 706 885 2201 The margins of the table, both on the right and at the bottom, give totals. The bottom line is just the frequency table of the variable Class. Class Frequency First 325 Second 285 Third 706 Crew 885 Total 2201 19

Survival Contingency Tables Class First Second Third Crew Total Survived 203 118 178 212 711 Died 122 167 528 673 1490 Total 325 285 706 885 2201 The right column of the table is the frequency table of the variable Survival. Survival Frequency Survived 711 Died 1490 Total 2201 20

Survival Contingency Tables Class First Second Third Crew Total Survived 203 118 178 212 711 Died 122 167 528 673 1490 Total 325 285 706 885 2201 Each cell of the table gives the count for a combination of values of the two variables. For example, the highlighted cell shows that 118 second-class passengers survived. So what does the green highlighted cell show? 21

Survival Row Proportions The table below shows the row proportions for the Titanic data set. The row proportions are computed as the counts divided by their row totals. Class First Second Third Crew Total Survived 203/711 =.286 118/711 =.166 178/711 =.250 212/711 =.298 711/711 = Died 122/1490 =.082 167/1490 =.112 528/1490 =.354 673/1490 =.452 1490/1490 = Total 325/2201 =.148 285/2201 =.129 706/2201 =.321 885/2201 =.402 2201/2201 = 22

Survival Example 3 a) What does 167/1490 =.112 (second column, second row) represent in the table? b) What does 885/2201 =.402 (fourth column, third row) represent in the table? Class First Second Third Crew Total Survived 203/711 =.286 118/711 =.166 178/711 =.250 212/711 =.298 711/711 = Died 122/1490 =.082 167/1490 =.112 528/1490 =.354 673/1490 =.452 1490/1490 = Total 325/2201 =.148 285/2201 =.129 706/2201 =.321 885/2201 =.402 2201/2201 = 23

Survival Column Proportions A contingency table of the column proportions is computed in a similar way, where each column proportion is computed as the count divided by the corresponding column total. Class First Second Third Crew Total Survived 203/325 =.625 118/285 =.414 178/706 =.252 212/885 =.240 711/2201 =.323 Died 122/325 =.375 167/285 =.586 528/706 =.748 673/885 =.760 Total 325/325 = 285/285 = 706/706 = 885/885 = 1490/2201 =.677 2201/2201 = 24

Survival Example 4 a) What does 167/285 =.586 (second column, second row) represent in the table? b) What does 711/2201 =.323 (fifth column, first row) represent in the table? Class First Second Third Crew Total Survived 203/325 =.625 118/285 =.414 178/706 =.252 212/885 =.240 711/2201 =.323 Died 122/325 =.375 167/285 =.586 528/706 =.748 673/885 =.760 Total 325/325 = 285/285 = 706/706 = 885/885 = 1490/2201 =.677 2201/2201 = 25

Survival Column Proportions In the table, the value 0.625 indicates that 62.5% of first class passengers survived. This rate of survival is much higher compared to second class passengers (41.4%), third class passengers (25.2%), or crew members (24.0%). Class First Second Third Crew Total Survived 203/325 =.625 118/285 =.414 178/706 =.252 212/885 =.240 711/2201 =.323 Died 122/325 =.375 167/285 =.586 528/706 =.748 673/885 =.760 Total 325/325 = 285/285 = 706/706 = 885/885 = 1490/2201 =.677 2201/2201 = 26

Survival Column Proportions Because these differences in survival rates between the classes is unlikely from random chance alone, this provides evidence that the class and survival variables are associated. We say the two variables are dependent. Class First Second Third Crew Total Survived 203/325 =.625 118/285 =.414 178/706 =.252 212/885 =.240 711/2201 =.323 Died 122/325 =.375 167/285 =.586 528/706 =.748 673/885 =.760 Total 325/325 = 285/285 = 706/706 = 885/885 = 1490/2201 =.677 2201/2201 = 27

Mosaic Plots Mosaic plots are graphical displays of contingency tables. The widths of the bars match the proportions for each level, while the heights match the column proportions. 28

Independent When the variables are independent, all proportions are the same, so the boxes line up in a grid. Column 1 Column 2 Column 3 Row 1 5 10 15 Row 2 8 16 24 29

Dependent When the variables are dependent, proportions are not the same, so the boxes do not line up. Column 1 Column 2 Column 3 Row 1 5 16 18 Row 2 8 10 18 30

Example 5 The mosaic plot below compares class and survival on the Titanic. Based on the plot, are the variables independent? 31

Example 6 A random set of 100 people who have pets were polled to see if there was an association between gender and whether they preferred either a dog or a cat. The results of the survey are below. Dog Cat Total Male 40 10 50 Female 20 30 50 Total 60 40 100 32

Example 6 continued a) Compute and interpret the column proportions. b) Does there appear to be an association between gender and type of pet? Explain. Dog Cat Total Male 40 10 50 Female 20 30 50 Total 60 40 100 33

Example 7 The mosaic plot below compares gender and type of pet. Based on the plot, are the variables independent? 34

Example 8 There are 10 boys and 12 girls in Mr. Fleck's fourth grade class and 15 boys and 18 girls in Mrs. Parker s fourth grade class. One student is randomly selected to be hall monitor. a) Use this information to complete the contingency table below. Teacher Mr. Fleck Mrs. Parker Total Gender Boy Girl Total 35

Example 8 continued a) Compute and interpret the row proportions. b) Does there appear to be an association between teacher and student's gender? Explain. Gender Boy Girl Total Mr. Fleck 10 12 22 Mrs. Parker 15 18 33 Total 25 30 55 36

Example 9 The mosaic plot below compares teacher and student gender. Based on the plot, are the variables independent? 37