You Can Bet On It, The Missing Rows are Preserved with PRELOADFMT and COMPLETETYPES

Similar documents
%CHCKFRQS A Macro Application for Generating Frequencies for QC and Simple Reports

Frequencies. Chapter 2. Descriptive statistics and charts

Moving on from MSTAT. March The University of Reading Statistical Services Centre Biometrics Advisory and Support Service to DFID

expressed on operational issues are those of the authors and not necessarily those of the U.S. Census Bureau.

Recurring Events Guide

Introduction to IBM SPSS Statistics (v24)

Algebra I Module 2 Lessons 1 19

CITATION METRICS WORKSHOP (WEB of SCIENCE)

Installing a Turntable and Operating it Under AI Control

The Time Series Forecasting System Charles Hallahan, Economic Research Service/USDA, Washington, DC

Henson User Instruction Manual

Mixed Effects Models Yan Wang, Bristol-Myers Squibb, Wallingford, CT

1. MORTALITY AT ADVANCED AGES IN SPAIN MARIA DELS ÀNGELS FELIPE CHECA 1 COL LEGI D ACTUARIS DE CATALUNYA

TL-2900 AMMONIA & NITRATE ANALYZER DUAL CHANNEL

Preview. There are multiple ways this copywork can be used. Your child can:

Running head: [SHORTENED TITLE UP TO 50 CHARACTERS] 1. [Title Here, up to 12 Words, on One to Two Lines]

myevnts FREQUENTLY ASKED QUESTIONS BROADCAST 2014

There are two search options available: the single search box after the row of tabs, and advanced search which we would recommend.

MODFLOW - Grid Approach

Home Means Nevada. Nevada's Official State Song. by Bertha Raffetto arranged by David C. Bugli

v. 8.0 GMS 8.0 Tutorial MODFLOW Grid Approach Build a MODFLOW model on a 3D grid Prerequisite Tutorials None Time minutes

PROC TABULATE BY EXAMPLE, SECOND EDITION BY LAUREN HAWORTH LAKE, JULIE MCKNIGHT

Managing Outage Details

Latin Square Design. Design of Experiments - Montgomery Section 4-2

Math in Motion SAMPLE FIRST STEPS IN MUSIC THEORY. Caleb Skogen

Page numbers go in the top right corner and header title on the top left corner; the header text is left-justified.

SEAI Lighting Upgrade Credits Calculation Tool Guidance for Use. Date: 12/03/2018 Version 1.0

Oh, When the Saints Medley. œ. Œ œ œ œ. saints go march - in. œ œ œ. œ œ j. This train is bound for glo -ry, . œ. œ œ œ œ œ œ œ. œ. œ. œ. œ. œ œ.

User s Manual. Log Scale (/LG) GX10/GX20/GP10/GP20/GM10 IM 04L51B01-06EN. 2nd Edition

Page 1) 7 points Page 2) 16 points Page 3) 22 points Page 4) 21 points Page 5) 22 points Page 6) 12 points. TOTAL out of 100

1. Update Software in Meter

Detecting Medicaid Data Anomalies Using Data Mining Techniques Shenjun Zhu, Qiling Shi, Aran Canes, AdvanceMed Corporation, Nashville, TN

The Comeback Trumpet Player

Before the Federal Communications Commission Washington, D.C ) ) ) ) ) REPLY COMMENTS OF PCIA THE WIRELESS INFRASTRUCTURE ASSOCIATION

The lines and spaces of the staff are given certain letter names when the treble clef is used.

Key Concepts. General Rules

Decoder version 3.5. Plug and play decoder for N-Scale Atlas Classic Series GP7, GP9, GP30, GP35

On Your Own. Applications. Unit 2. ii. The following are the pairs of mutual friends: A-C, A-E, B-D, C-D, and D-E.

Research Papers and Essays: Formatting and Citing Sources

myevnts FREQUENTLY ASKED QUESTIONS CABLE 2014

August 7, Legal Memorandum

Linking Words / Phrases

We Are the Future by Paul Rardin

CITATION METRICS WORKSHOP (WEB of SCIENCE)

Line 5 Line 4 Line 3 Line 2 Line 1

Editing Reference Types & Styles: Macintosh. EndNote Support & Training October 2017

MATH& 146 Lesson 11. Section 1.6 Categorical Data

What s New in VISION Digital Vision Inc February 2017

Example the number 21 has the following pairs of squares and numbers that produce this sum.

EndNote Basic Workbook for School of Management

MRF-300/RFX150 INSTALLATION MANUAL

Recorder. Flashcards

Digital Aquatics Reef Keeper Setup Guide (for Shlobster dosing pumps dosing 2 part)

At-speed testing made easy

Vision Call Statistics User Guide

Running head: EXAMPLE APA STYLE PAPER 1. Example of an APA Style Paper. Justine Berry. Austin Peay State University

Footnotes and Endnotes

Chapter 1 Midterm Review

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

Lesson 7 Traffic Lights

Note Reading Worksheet Bass Clef Exercise #1

Synergy SIS Attendance Administrator Guide

Trial decision. Invalidation No Tokyo, Japan. Tokyo, Japan. Tokyo, Japan. Tokyo, Japan. Tokyo, Japan. Tokyo, Japan 1 / 28

Review. What about images? What about images? Slides04 - RGB-Pixels.key - September 22, 2015

UNIT IV. Sequential circuit

Minimailer 4 OMR SPECIFICATION FOR INTELLIGENT MAILING SYSTEMS. 1. Introduction. 2. Mark function description. 3. Programming OMR Marks

User Guide. S-Curve Tool

ES&S - EVS Release , Version 4(Revision 1)

SIPROTEC Fault Record Analysis SIGRA

MC9211 Computer Organization

Music Technology Advanced Subsidiary Unit 1: Music Technology Portfolio 1

Our E-journal Journey: Where to Next?

SPM Guide to Preparing Manuscripts for Publication

Make Way for Ducklings Robert McCloskey

Using Commas. c. Common introductory words that should be followed by a comma include yes, however, well.

Processing the Output of TOSOM

Instructional Materials Procedures

EndNote Web. Quick Reference Card THOMSON SCIENTIFIC

User s Manual. Log Scale (/LG) GX10/GX20/GP10/GP20/GM10 IM 04L51B01-06EN. 3rd Edition

Printing From Applications: QuarkXPress 8

Copyright Jack R Pease - not to be reproduced without permission. COMPOSITION LIBRARY

KOREA TIMES U.S.A. MEDIA KIT

Running head: COMMUNITY ANALYSIS. Community Analysis: Wheaton Public Library Sarah Breslaw Towson University

MRF-250 INSTALLATION MANUAL

LE062XF DCC Decoder for Atlas N Scale Locomotives

Digital audio is superior to its analog audio counterpart in a number of ways:

Views on local news in the federal electoral district of Montmagny-L Islet-Kamouraska-Rivière-du-Loup

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

Well temperament revisited: two tunings for two keyboards a quartertone apart in extended JI

Go! Guide: The Notes Tab in the EHR

With Export all setting information (preferences, user setttings) can be exported into a text file.

THE MONTY HALL PROBLEM

Exercise #1: Create and Revise a Smart Group

the lawyers know the parameters, the limits of questions that can and can't be asked. All right? But

Introduction. Tonality is a natural force, like gravity.-paul Hindemith

APPLICATION AND EFFECTIVENESS OF THE SEA DIRECTIVE (DIRECTIVE 2001/42/EC) 1. Legal framework CZECH REPUBLIC LEGAL AND ORGANISATIONAL ARRANGEMENTS 1

VISSIM Tutorial. Starting VISSIM and Opening a File CE 474 8/31/06

Statistics for Engineers

Arnold D. Kates Film Collection

OpenOne Outage Management System

Transcription:

Paper 10600-2016 You Can Bet On It, The Missing Rows are Preserved with PRELOADFMT and COMPLETETYPES Christopher J. Boniface, U.S. Census Bureau; Janet L. Wysocki, U.S. Census Bureau ABSTRACT Do you w rite reports that sometimes have missing categories across all class variables? Most times the customer or sponsor w ants to see those missing categories in the report. Some programmers w ill w rite all sorts of additional data step code in order to show the zeroes for the missing row s or columns. Did you ever ponder that there must be an easier w ay to accomplish this? Well, PROC MEANS and PROC TABULATE in conjunction w ith PROC FORMAT can handle this situation w ith a couple of pow erful options. With PROC TABULATE, w e can use the PRELOA DFMT and PRINTMISS options in conjunction w ith a user-defined format w ith PROC FORMAT to accomplish this task. With PROC SUMMARY, w e can use the COMPLETETY PES option to get all the row s w ith zeroes. The Census Bureau produces special tabulations for many sponsors. Often the sponsor w ill w ant a report w ith various counts across various categorical variables. Sometimes the crossing of certain class variables results in no observations. How ever, the sponsor w ants to see these missing row s or columns in the table. By default, PROC TABULATE and PROC MEANS w ill omit these missing categories from the result. This paper w ill show tw o easy examples of how to get those missing row s or columns in the table. We ll present a special tabulation w here the sponsor w ants to see occupation code by county w ithin each state. How ever, not all occupations are filled in each state resulting in missing row s. To solve this, w e ll show one example using PROC TABULATE w ith the PRELOA DFMT option in tandem w ith a user-defined format created in PROC FORMAT. Secondly, w e ll show another solution using PROC SUMMARY w ith COMPLETETY PES to secure the missing categories. The final result is that all the state tables w ill have the same number of row s and all of the occupations listed. INTRODUCTION The U.S. Census Bureau s 2010 Census Special Tabulation Program provides data users w ith the option to have user-defined tabulations created from decennial census microdata on a cost-reimbursable basis. When requesting a special tabulation, the sponsor should provide a preliminary, general specification of the data needed. We w ill ask them some specific questions, and then w ork w ith them to develop a final, detailed specification that documents their data needs and geographic requirements. For additional information on the U.S. Census Bureau s Special Tabulation Program, see http://w w w.census.gov/population/w ww/cen2010/spec-tab/. We use SAS to tabulate our decennial special tabulations. In general, w e use PROC TABULATE or PROC SUMMARY to generate our special tabulations report. Many sponsors of custom special tabulations request crosstabs of various categorical variables. Sometimes the crossing of certain class variables results in no observations. How ever, the sponsor w ants to see these missing row s or columns in the table. This paper w ill explore tw o solutions to this problem. One solution w ill use PROC TABULATE w ith the PRELOADFMT option in tandem w ith a user-defined format created in PROC FORMAT. The second solution w ill use PROC SUMMARY w ith the COMPLETETY PES option to preserve the missing categories. Sample Case Study A sponsor has requested a decennial special tabulation from the Census Bureau. They w ant to see counts of all occupations by county for all states for a particular Decennial Census. They w ant one file per state. They indicate that they w ant an Excel table that displays all occupations in the row s of the table and all counties going across as the column. Additionally, they request that all occupations be show n in the table even if no one held a particular occupation in a particular state. Thus, they w ant to see the same number of row s in each state table. That is, they 1

w ant to see all occupations in each table. 1 SOLUTION 1: PROC TABULATE/PRELOADFMT/PRINTMISS PROC TABULATE is used in our first solution. PROC TABULATE is a pow erful tool in doing tabulations and is used quite a bit here to not only compute the tabulation counts, but to output a report w ith row s and columns. A basic table statement w ith tw o class variables in it, separated by a comma, w ill produce a table w ith row s and columns. The PROC TABULATE code below essentially produces the counts and outputs a report w ith row s and columns. The class statements list the tw o categorical variables in our study, occupation and state county codes. The occupation values w ill appear in the row dimension of the table, since it is listed before the comma in the table statement. The state county values w ill appear in the column dimension of the table, since it is listed after the comma in the table statement. Note also that w e have a PROC FORMAT for all the many occupation codes. There are hundreds of codes, but for display purposes, we re just show ing six of them. The output of the PROC TABULATE below is show n in Table1. The output in Table 1 looks fine and good, until w e take a closer look. Where are the Pumping station operators, Shuttle car operators, and Military officer occupations in the table? They are not there because no one held these jobs in any county in the state. By default, PROC TABULATE w ill output only the values of the categories that have at least one occurrence in the data. The missing categories /row s are deleted by default. We create an output SAS dataset called sums, and any crossing that does not exist in the data w ill not output to the SAS dataset. Thus, Code 975 for A displays as a missing value by default, since there are no occurrences for this crossing. Note: the MLF option on the class statement in tandem w ith the FORMAT statement allow s the labels for the occupation codes to show in the column. Without either, the actual codes (973,974) w ould show instead of the labels in the column. /*Job Category Titles with Census 2000 Codes partial listing for display*/ proc format; value $occf (notsorted multilabel) '965' = 'Pumping station operators' '972' = 'Refuse and recyclable material collectors' '973' = 'Shuttle car operators' '974' = 'Tank car, truck, and ship loaders' '975' = 'Material moving workers, all other' 980 = operations leaders/managers ; proc tabulate data=recodes out=sums; class occ / order=data mlf; class stcou / order=data mlf ; table occ, stcou * (count*sum); weight pwt; format occ $occf. ; Reported Counts by A B C D Code Refuse and recyclable material collectors 972 80 95 130 40 Tank car, truck, and ship loaders 974 15 20 30 25 Material moving workers, all other 975. 10 45 65 1 All population counts displayed are fictitious 2

Table 1. Table w ith the missing rows not show ing How can w e get the missing row (s) to appear? There are tw o pow erful options in PROC TABULATE that w ill solve this problem: PRELOADFMT and PRINTMISS. The code below show s the solution. Essentially, you need to use these tw o options in tandem. The PRELOA DFMT needs to be an option on the class statement of your categorical values. Moreover, you need to specify a format in the format statement. Also, you need to specify the (NOTSORTED and MULTILABEL) options in the PROC FORMAT for the $occf. format. Lastly and most important, you need the PRINTMISS option on the TABLE statement. Effectively, you are telling SAS to output all of the occupation codes in the occupation format ($occf.) regardless of w hether there is an occurrence or not in the data. Thus, all occupations codes w ill appear in the row s of the table. Furthermore, by specifying the PRINTMISS option, any missing values in a particular cell of the table w ill be output to the sums dataset and w ill display as a zero instead of a missing value in the table. Table 2 show s the output of the follow ing PROC TABULATE. This time, the Pumping station operators, Shuttle car operators, and Military officers appear as row s in the table. Note that all values for these occupations are zero. Also, other cells that previously show ed a missing value have changed to a zero. proc tabulate data=recodes out=sums; class occ / order=data preloadfmt mlf; class stcou / order=data preloadfmt mlf ; table occ, stcou * (count*sum) /printmiss; weight pwt; format occ $occf. ; Reported Counts by Code A B C D Pumping station operators 965 0 0 0 0 Refuse and recyclable material collectors 972 80 95 130 40 Shuttle car operators 973 0 0 0 0 Tank car, truck, and ship loaders 974 15 20 30 25 Material moving workers, all other 975 0 10 45 65 operations leaders/managers 980 0 0 0 0 Table 2. Table w ith the missing rows showing SOLUTION 2: PROC SUMMARY/COMPLETETYPES PROC SUMMARY is used in our second solution. As w ith PROC TABULATE, PROC SUMMARY w ill not output missing categories of the class variables involved in our crosstab of occupation w ith county. The basic PROC SUMMARY code is show n below. 3

proc summary data=recodes nway; class occ county; output out=outrecodes1 sum=; Take notice the NWAY option on the PROC SUMMARY line. This option w ill allow the highest level of tabulation. For our example, this w ill show observations at the county for each occupation. The CLASS statement lists the variables occ and county. The VAR statement sums the count variable. The OUTPUT statement outputs a summed dataset name outrecodes using the option sum=. Table 3 show s the PROC SUMMARY output and the fact that the Shuttle car operators, Pumping station operators, and Military officer occupations are missing from the output. Reported Counts by Code Count Refuse and recyclable material collectors 972 A 80 Refuse and recyclable material collectors 972 B 95 Refuse and recyclable material collectors 972 C 130 Refuse and recyclable material collectors 972 D 40 Tank car, truck, and ship loaders 974 A 15 Tank car, truck, and ship loaders 974 B 20 Tank car, truck, and ship loaders 974 C 30 Tank car, truck, and ship loaders 974 D 25 Material moving workers, all other 975 B 10 Material moving workers, all other 975 C 45 Material moving workers, all other 975 D 65 Table 3. Table w ith the missing rows not show ing PROC SUMMARY has its ow n solution for this problem and it is w ith the option COMPLETETY PES. Similar to the PRELOA DFMT and PRINTMISS options w ith PROC TABULATE, the COMPLETETY PES option w ill output all values of a categorical variable even if there are no values for a particular crossing. When you use the COMPLETETY PES option on the PROC SUMMARY statement, all combinations of the class variables w ill appear in the output. In this case, all combinations of the crossings for occupation and county w ill appear in the output. Using the option MISSING = 0 w ill zero fill the missing observations. The limitation of using this procedure how ever, is that at least one observation for a particular occupation code needs to exist w ithin the dataset. Thus, the follow ing code w ill solve part of the problem, but not all of it. Adding the COMPLETETY PES option w ill add a row for occupation code 975 county A, since there is at least one observation already for occupation code 975. How ever, w e still don t have any zero filled observations for occupation codes 965, 973 and 980. How can w e get those row s in the table? 4

The PROC SUMMARY CODE below using COMPLETETYPES will solve part of the problem. 4 shows a row for occupation code 975, county A. Table options missing = 0; proc summary data=recodes nway completetypes; class occ county; output out=outrecodes2 sum=; Reported Counts by Code Count Refuse and recyclable material collectors 972 A 80 Refuse and recyclable material collectors 972 B 95 Refuse and recyclable material collectors 972 C 130 Refuse and recyclable material collectors 972 D 40 Tank car, truck, and ship loaders 974 A 15 Tank car, truck, and ship loaders 974 B 20 Tank car, truck, and ship loaders 974 C 30 Tank car, truck, and ship loaders 974 D 25 Material moving workers, all other 975 A 0 Material moving workers, all other 975 B 10 Material moving workers, all other 975 C 45 Material moving workers, all other 975 D 65 Table 4. Table w ith some of the missing rows showing To show the occupation codes for those occupations w here no counts exist in any of the counties, w e need to use PRELOA DFMT along w ith COMPLETETY PES. Just like the example w ith PROC TABULATE used in Solution 1, w e need to set the stage and tell SAS the viable occupation codes using PRELOA DFMT. We also need to specify a format for the occ variable on the FORMAT statement. /*Job Category Titles with Census 2000 Codes partial listing for display*/ proc format; value $occf (notsorted multilabel) '965' = 'Pumping station operators' '972' = 'Refuse and recyclable material collectors' '973' = 'Shuttle car operators' '974' = 'Tank car, truck, and ship loaders' '975' = 'Material moving workers, all other' 980 = Military; 5

options missing = 0; proc summary data=recodes nway completetypes; class occ county /preloadfmt; output out=outrecodes2 sum=; format occ $occf. Reported Counts by Code Count Pumping station operators 965 A 0 Pumping station operators 965 B 0 Pumping station operators 965 C 0 Pumping station operators 965 D 0 Refuse and recyclable material collectors 972 A 80 Refuse and recyclable material collectors 972 B 95 Refuse and recyclable material collectors 972 C 130 Refuse and recyclable material collectors 972 D 40 Shuttle car operators 973 A 0 Shuttle car operators 973 B 0 Shuttle car operators 973 C 0 Shuttle car operators 973 D 0 Tank car, truck, and ship loaders 974 A 15 Tank car, truck, and ship loaders 974 B 20 Tank car, truck, and ship loaders 974 C 30 Tank car, truck, and ship loaders 974 D 25 Material moving workers, all other 975 A 0 Material moving workers, all other 975 B 10 Material moving workers, all other 975 C 45 Material moving workers, all other 975 D 65 operations leader/managers 980 A 0 operations leader/managers 980 B 0 operations leader/managers 980 C 0 operations leader/managers 980 D 0 Table 5. Table w ith all missing rows showing 6

CONCLUSION As w e have show n, you can display all missing categories of a class variable in your tables. Don t get caught w ith missing row s or columns in your tables. We presented tw o solutions to the problem of missing observations using PROC TABULATE and PROC SUMMARY. In the first solution, w e use PROC TABULATE w ith the PRELOADFMT and PRINTMISS options in tandem w ith a user-defined format created w ith PROC FORMAT to ensure that all occupations appear in the row s regardless of w hether or not they re in the data. In the second solution, w e use PROC SUMMARY w ith the COMPLETETY PES option to preserve the missing occupation codes in the data. The missing row s are alw ays preserved w ith PRELOA DMT and PRINTMISS in PROC TABULATE and w ith COMPLETETY ES in PROC SUMMARY. These solutions are available starting w ith SAS 8. You can bet on it! CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Name: Christopher J. Boniface U.S. Census Bureau Washington D.C. 20233 Work Phone: (301)763-5769 E-mail: christopher.j.boniface@census.gov Name: Janet L. Wysocki U.S. Census Bureau Washington D.C. 20233 Work Phone: (301)763-2446 E-mail: janet.l.w ysocki@census.gov SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7