Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
Example 1: Creating a Basic Two-Dimensional Table
Procedure features:
PROC TABULATE statement options:
-
FORMAT=
TABLE statement
-
crossing (*) operator
TABLE statement options:
-
RTS=
Other features: FORMAT statement
This example
-
creates a category for each type of user (residential or business) in each division of each region
-
applies the same format to all cells in the table
-
applies a format to each class variable
-
extends the space for row headings.
Program
Create the ENERGY data set. ENERGY contains data on expenditures of energy for business and residential customers in individual states in the Northeast and West regions of the United States. A DATA step on page 1401 creates the data set.
data energy; length State ; input Region Division state $ Type Expenditures; datalines; 1 1 ME 1 708 1 1 ME 2 379 ... more data lines ... 4 4 HI 1 273 4 4 HI 2 298 ;
Create the REGFMT., DIVFMT., and USETYPE. formats. PROC FORMAT creates formats for Region, Division, and Type.
proc format; value regfmt 1='Northeast' 2='South' 3='Midwest' 4='West'; value divfmt 1='New England' 2='Middle Atlantic' 3='Mountain' 4='Pacific'; value usetype 1='Residential Customers' 2='Business Customers'; run;
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell .
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Region, Division, and Type.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns . The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell.
table region*division, type*expenditures
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Output
Energy Expenditures for Each Region 1 (millions of dollars) --------------------------------------------------- Type ------------------------- Residential Business Customers Customers ------------+------------ ExpendituresExpenditures ------------+------------ Sum Sum -----------------------+------------+------------ Region Division -----------+----------- Northeast New England ,477 ,129 -----------+------------+------------ Middle Atlantic ,379 ,078 -----------+-----------+------------+------------ West Mountain ,476 ,729 -----------+------------+------------ Pacific ,959 ,619 --------------------------------------------------
Example 2: Specifying Class Variable Combinations to Appear in a Table
Procedure features:
PROC TABULATE Statement options:
-
CLASSDATA=
-
EXCLUSIVE
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT., DIVFMT., and USETYPE. on page 1247
This example
-
uses the CLASSDATA= option to specify combinations of class variables to appear in a table
-
uses the EXCLUSIVE option to restrict the output to only the combinations specified in the CLASSDATA= data set. Without the EXCLUSIVE option, the output would be the same as in Example 1 on page 1246.
Program
Create the CLASSES data set. CLASSES contains the combinations of class variable values that PROC TABULATE uses to create the table.
data classes; input region division type; datalines; 1 1 1 1 1 2 4 4 1 4 4 2 ;
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. CLASSDATA= and EXCLUSIVE restrict the class level combinations to those that are specified in the CLASSES data set.
proc tabulate data=energy format=dollar12. classdata=classes exclusive;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Region, Division, and Type.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns. The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell.
table region*division, type*expenditures
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Output
Energy Expenditures for Each Region 1 (millions of dollars) --------------------------------------------------- Type ------------------------- Residential Business Customers Customers ------------+------------ ExpendituresExpenditures ------------+------------ Sum Sum -----------------------+------------+------------ Region Division -----------+----------- Northeast New England ,477 ,129 -----------+-----------+------------+------------ West Pacific ,959 ,619 ---------------------------------------------------
Example 3: Using Preloaded Formats with Class Variables
Procedure features:
PROC TABULATE statement option:
-
OUT=
CLASS statement options:
-
EXCLUSIVE
-
PRELOADFMT
TABLE statement option:
-
PRINTMISS
Other features: PRINT procedure
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT., DIVFMT., and USETYPE. on page 1247
This example
-
creates a table that includes all possible combinations of formatted class variable values (PRELOADFMT with PRINTMISS), even if those combinations have a zero frequency and even if they do not make sense
-
uses only the preloaded range of user-defined formats as the levels of class variables (PRELOADFMT with EXCLUSIVE).
-
writes the output to an output data set, and prints that data set.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Region, Division, and Type. PRELOADFMT specifies that PROC TABULATE use the preloaded values of the user-defined formats for the class variables.
class region division type / preloadfmt;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns, and specify row and column options. PRINTMISS specifies that all possible combinations of user-defined formats be used as the levels of the class variables.
table region*division, type*expenditures / rts=25 printmiss;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Specify the table options and the output data set. The OUT= option specifies the name of the output data set to which PROC TABULATE writes the data.
proc tabulate data=energy format=dollar12. out=tabdata;
Specify subgroups for the analysis. The EXCLUSIVE option, when used with PRELOADFMT, uses only the preloaded range of user-defined formats as the levels of class variables.
class region division type / preloadfmt exclusive;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns, and specify row and column options. The PRINTMISS option is not specified in this case. If it were, then it would override the EXCLUSIVE option in the CLASS statement.
table region*division, type*expenditures / rts=25;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Print the output data set WORK.TABDATA.
proc print data=tabdata; run;
Output
This output, created with the PRELOADFMT and PRINTMISS options, contains all possible combinations of preloaded user-defined formats for the class variable values. It includes combinations with zero frequencies, and combinations that make no sense, such as Northeast and Pacific .
Energy Expenditures for Each Region 1 (millions of dollars) --------------------------------------------------- Type ------------------------- Residential Business Customers Customers ------------+------------ ExpendituresExpenditures ------------+------------ Sum Sum -----------------------+------------+------------ Region Division -----------+----------- Northeast New England ,477 ,129 -----------+------------+------------ Middle Atlantic ,379 ,078 -----------+------------+------------ Mountain . . -----------+------------+------------ Pacific . . -----------+-----------+------------+------------ South New England . . -----------+------------+------------ Middle Atlantic . . -----------+------------+------------ Mountain . . -----------+------------+------------ Pacific . . -----------+-----------+------------+------------ Midwest New England . . -----------+------------+------------ Middle Atlantic . . -----------+------------+------------ Mountain . . -----------+------------+------------ Pacific . . -----------+-----------+------------+------------ West New England . . -----------+------------+------------ Middle Atlantic . . -----------+------------+------------ Mountain ,476 ,729 -----------+------------+------------ Pacific ,959 ,619 -------------------------------------------------
This output, created with the PRELOADFMT and EXCLUSIVE options, contains only those combinations of preloaded user-defined formats for the class variable values that appear in the input data set. This output is identical to the output from Example 1 on page 1246.
Energy Expenditures for Each Region 1 (millions of dollars) --------------------------------------------------- Type ------------------------- Residential Business Customers Customers ------------+------------ ExpendituresExpenditures ------------+------------ Sum Sum -----------------------+------------+------------ Region Division -----------+----------- Northeast New England ,477 ,129 -----------+------------+------------ Middle Atlantic ,379 ,078 -----------+-----------+------------+------------ West Mountain ,476 ,729 -----------+------------+------------ Pacific ,959 ,619 ---------------------------------------------------
This output is a listing of the output data set TABDATA, which was created by the OUT= option in the PROC TABULATE statement. TABDATA contains the data that is created by having the PRELOADFMT and EXCLUSIVE options specified.
Energy Expenditures for Each Region (millions of dollars) E x p e n d i t D u i _ r R v __ T e e i T P A s g s T Y A B _ O i i y P G L S b o o p E E E u s n n e _ __ m 1 Northeast New England Residential Customers 111 1 1 7477 2 Northeast New England Business Customers 111 1 1 5129 3 Northeast Middle Atlantic Residential Customers 111 1 1 19379 4 Northeast Middle Atlantic Business Customers 111 1 1 15078 5 West Mountain Residential Customers 111 1 1 5476 6 West Mountain Business Customers 111 1 1 4729 7 West Pacific Residential Customers 111 1 1 13959 8 West Pacific Business Customers 111 1 1 12619
Example 4: Using Multilabel Formats
Procedure features:
CLASS statement options:
-
MLF
PROC TABULATE statement options:
-
FORMAT=
TABLE statement
-
ALL class variable
-
concatenation (blank) operator
-
crossing (*) operator
-
grouping elements (parentheses) operator
-
label
-
variable list
Other features:
FORMAT procedure
FORMAT statement
VALUE statement options:
-
MULTILABEL
This example
-
shows how to specify a multilabel format in the VALUE statement of PROC FORMAT
-
shows how to activate multilabel format processing using the MLF option with the CLASS statement
-
demonstrates the behavior of the N statistic when multilabel format processing is activated.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=64;
Create the CARSURVEY data set. CARSURVEY contains data from a survey that was distributed by a car manufacturer to a focus group of potential customers who were brought together to evaluate new car names. Each observation in the data set contains an identification number, the participant s age, and the participant s ratings of four car names . A DATA step creates the data set.
data carsurvey; input Rater Age Progressa Remark Jupiter Dynamo; datalines; 1 38 94 98 84 80 2 49 96 84 80 77 3 16 64 78 76 73 4 27 89 73 90 92 ... more data lines ... 77 61 92 88 77 85 78 24 87 88 88 91 79 18 54 50 62 74 80 62 90 91 90 86 ;
Create the AGEFMT. format. The FORMAT procedure creates a multilabel format for ages by using the MULTILABEL option on page 449. A multilabel format is one in which multiple labels can be assigned to the same value, in this case because of overlapping ranges. Each value is represented in the table for each range in which it occurs. The NOTSORTED option stores the ranges in the order in which they are defined.
proc format; value agefmt (multilabel notsorted) 15 - 29 = 'Below 30 years' 30 - 50 = 'Between 30 and 50' 51 - high = 'Over 50 years' 15 - 19 = '15 to 19' 20 - 25 = '20 to 25' 25 - 39 = '25 to 39' 40 - 55 = '40 to 55' 56 - high = '56 and above'; run;
Specify the table options. The FORMAT= option specifies up to 10 digits as the default format for the value in each table cell.
proc tabulate data=carsurvey format=10.;
Specify subgroups for the analysis. The CLASS statement identifies Age as the class variable and uses the MLF option to activate multilabel format processing.
class age / mlf;
Specify the analysis variables. The VAR statement specifies that PROC TABULATE calculate statistics on the Progressa, Remark, Jupiter, and Dynamo variables.
var progressa remark jupiter dynamo;
Define the table rows and columns. The row dimension of the TABLE statement creates a row for each formatted value of Age. Multilabel formatting allows an observation to be included in multiple rows or age categories. The row dimension uses the ALL class variable to summarize information for all rows. The column dimension uses the N statistic to calculate the number of observations for each age group. Notice that the result of the N statistic crossed with the ALL class variable in the row dimension is the total number of observations instead of the sum of the N statistics for the rows. The column dimension uses the ALL class variable at the beginning of a crossing to assign a label, Potential Car Names . The four nested columns calculate the mean ratings of the car names for each age group.
table age all, n all='Potential Car Names'*(progressa remark jupiter dynamo)*mean;
Specify the titles.
title1 "Rating Four Potential Car Names"; title2 "Rating Scale 0-100 (100 is the highest rating)";
Format the output. The FORMAT statement assigns the user-defined format AGEFMT. to Age for this analysis.
format age agefmt.; run;
Output
Output 48.4
|
Rating Four Potential Car Names 1 Rating Scale 0-100 (100 is the highest rating) --------------------------------------------------------------------------- Potential Car Names ------------------------------------------- Progressa Remark Jupiter Dynamo ----------+----------+----------+---------- N Mean Mean Mean Mean ------------------+----------+----------+----------+----------+---------- Age ------------------ 15 to 19 14 75 78 81 73 ------------------+----------+----------+----------+----------+---------- 20 to 25 11 89 88 84 89 ------------------+----------+----------+----------+----------+---------- 25 to 39 26 84 90 82 72 ------------------+----------+----------+----------+----------+---------- 40 to 55 14 85 87 80 68 ------------------+----------+----------+----------+----------+---------- 56 and above 15 84 82 81 75 ------------------+----------+----------+----------+----------+---------- Below 30 years 36 82 84 82 75 ------------------+----------+----------+----------+----------+---------- Between 30 and 50 25 86 89 81 73 ------------------+----------+----------+----------+----------+---------- Over 50 years 19 82 84 80 76 ------------------+----------+----------+----------+----------+---------- All 80 83 86 81 74 ---------------------------------------------------------------------------
|
Example 5: Customizing Row and Column Headings
Procedure features:
-
TABLE statement
-
labels
-
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT., DIVFMT., and USETYPE. on page 1247
This example shows how to customize row and column headings. A label specifies text for a heading. A blank label creates a blank heading. PROC TABULATE removes the space for blank column headings from the table.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns. The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.
table region*division, type='Customer Base'*expenditures=' '*sum=' '
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Output
The heading for Type contains text that is specified in the TABLE statement. The TABLE statement eliminated the headings for Expenditures and Sum.
Energy Expenditures for Each Region 1 (millions of dollars) --------------------------------------------------- Customer Base ------------------------- Residential Business Customers Customers -----------------------+------------+------------ Region Division -----------+----------- Northeast New England ,477 ,129 -----------+------------+------------ Middle Atlantic ,379 ,078 -----------+-----------+------------+------------ West Mountain ,476 ,729 -----------+------------+------------ Pacific ,959 ,619 ---------------------------------------------------
Example 6: Summarizing Information with the Universal Class Variable ALL
Procedure features:
-
PROC TABULATE statement options:
-
FORMAT=
-
-
TABLE statement:
-
ALL class variable
-
concatenation (blank operator)
-
format modifiers
-
grouping elements (parentheses operator)
-
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT., DIVFMT., and USETYPE. on page 1247
This example shows how to use the universal class variable ALL to summarize information from multiple categories.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=60;
Specify the table options. The FORMAT= option specifies COMMA12. as the default format for the value in each table cell.
proc tabulate data=energy format=comma12.;
Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows. The row dimension of the TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division and a row (labeled Subtotal ) that summarizes all divisions in the region. The last row of the report (labeled Total for All Regions ) summarizes all regions. The format modifier f=DOLLAR12. assigns the DOLLAR12. format to the cells in this row.
table region*(division all='Subtotal') all='Total for All Regions'*f=dollar12.,
Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Type and a column that is labeled All customers that shows expenditures for all customers in a row of the table. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.
type='Customer Base'*expenditures=' '*sum=' ' all='All Customers'*expenditures=' '*sum=' '
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Output
The universal class variable ALL provides subtotals and totals in this table.
Energy Expenditures for Each Region 1 (millions of dollars) ---------------------------------------------------------------- Customer Base ------------------------- Residential Business All Customers Customers Customers -----------------------+------------+------------+------------ Region Division -----------+----------- Northeast New England 7,477 5,129 12,606 -----------+------------+------------+------------ Middle Atlantic 19,379 15,078 34,457 -----------+------------+------------+------------ Subtotal 26,856 20,207 47,063 -----------+-----------+------------+------------+------------ West Division ----------- Mountain 5,476 4,729 10,205 -----------+------------+------------+------------ Pacific 13,959 12,619 26,578 -----------+------------+------------+------------ Subtotal 19,435 17,348 36,783 -----------------------+------------+------------+------------ Total for All Regions ,291 ,555 ,846 ----------------------------------------------------------------
Example 7: Eliminating Row Headings
Procedure features:
-
TABLE statement:
-
labels
-
ROW=FLOAT
-
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT., DIVFMT., and USETYPE. on page 1247
This example shows how to eliminate blank row headings from a table. To do so, you must both provide blank labels for the row headings and specify ROW=FLOAT in the TABLE statement.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows. The row dimension of the TABLE statement creates a row for each formatted value of Region. Nested within these rows is a row for each formatted value of Division. The analysis variable Expenditures and the Sum statistic are also included in the row dimension, so PROC TABULATE creates row headings for them as well. The text in quotation marks specifies the headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.
table region*division*expenditures=' '*sum=' ',
Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Type.
type='Customer Base'
Specify the row title space and eliminate blank row headings. RTS= provides 25 characters per line for row headings. ROW=FLOAT eliminates blank row headings.
/ rts=25 row=float;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Output
Compare this table with the output in Example 5 on page 1258. The two tables are identical, but the program that creates this table uses Expenditures and Sum in the row dimension. PROC TABULATE automatically eliminates blank headings from the column dimension, whereas you must specify ROW=FLOAT to eliminate blank headings from the row dimension.
Energy Expenditures for Each Region 1 (millions of dollars) --------------------------------------------------- Customer Base ------------------------- Residential Business Customers Customers -----------------------+------------+------------ Region Division -----------+----------- Northeast New England ,477 ,129 -----------+------------+------------ Middle Atlantic ,379 ,078 -----------+-----------+------------+------------ West Mountain ,476 ,729 -----------+------------+------------ Pacific ,959 ,619 ---------------------------------------------------
Example 8: Indenting Row Headings and Eliminating Horizontal Separators
Procedure features:
-
PROC TABULATE statement options:
-
NOSEPS
-
-
TABLE statement options:
-
INDENT=
-
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT., DIVFMT., and USETYPE. on page 1247
This example shows how to condense the structure of a table by
-
removing row headings for class variables
-
indenting nested rows underneath parent rows instead of placing them next to each other
-
eliminating horizontal separator lines from the row titles and the body of the table.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell. NOSEPS eliminates horizontal separator lines from row titles and from the body of the table.
proc tabulate data=energy format=dollar12. noseps;
Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table rows and columns. The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks in all dimensions specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.
table region*division, type='Customer Base'*expenditures=' '*sum=' '
Specify the row title space and indention value. RTS= provides 25 characters per line for row headings. INDENT= removes row headings for class variables, places values for Division beneath values for Region rather than beside them, and indents values for Division four spaces.
/ rts=25 indent=4;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region'; title2 '(millions of dollars)'; run;
Output
NOSEPS removes the separator lines from the row titles and the body of the table. INDENT= eliminates the row headings for Region and Division and indents values for Division underneath values for Region.
Energy Expenditures for Each Region 1 (millions of dollars) --------------------------------------------------- Customer Base ------------------------- Residential Business Customers Customers -----------------------+------------+------------ Northeast New England ,477 ,129 Middle Atlantic ,379 ,078 West Mountain ,476 ,729 Pacific ,959 ,619 ---------------------------------------------------
Example 9: Creating Multipage Tables
Procedure features:
-
TABLE statement
-
ALL class variable
-
BOX=
-
CONDENSE
-
INDENT=
-
page expression
-
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT., DIVFMT., and USETYPE. on page 1247
This example creates a separate table for each region and one table for all regions. By default, PROC TABULATE creates each table on a separate page, but the CONDENSE option places them all on the same page.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.
var expenditures;
Define the table pages. The page dimension of the TABLE statement creates one table for each formatted value of Region and one table for all regions. Text in quotation marks provides the heading for each page.
table region='Region: ' all='All Regions',
Define the table rows. The row dimension creates a row for each formatted value of Division and a row for all divisions. Text in quotation marks provides the row headings.
division all='All Divisions',
Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Type. Each cell that is created by these pages, rows, and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.
type='Customer Base'*expenditures=' '*sum=' '
Specify additional table options. RTS= provides 25 characters per line for row headings. BOX= places the page heading inside the box above the row headings. CONDENSE places as many tables as possible on one physical page. INDENT= eliminates the row heading for Division. (Because there is no nesting in the row dimension, there is nothing to indent.)
/ rts=25 box=_page_ condense indent=1;
Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures for Each Region and All Regions'; title2 '(millions of dollars)'; run;
Output
Energy Expenditures for Each Region and All Regions 1 (millions of dollars) --------------------------------------------------- Region: Northeast Customer Base ------------------------- Residential Business Customers Customers -----------------------+------------+------------ New England ,477 ,129 -----------------------+------------+------------ Middle Atlantic ,379 ,078 -----------------------+------------+------------ All Divisions ,856 ,2 07 --------------------------------------------------- --------------------------------------------------- Region: West Customer Base ------------------------- Residential Business Customers Customers -----------------------+------------+------------ Mountain ,476 ,729 -----------------------+------------+------------ Pacific ,959 ,619 -----------------------+------------+------------ All Divisions ,435 ,348 --------------------------------------------------- --------------------------------------------------- All Regions Customer Base ------------------------- Residential Business Customers Customers -----------------------+------------+------------ New England ,477 ,129 -----------------------+------------+------------ Middle Atlantic ,379 ,078 -----------------------+------------+------------ Mountain ,476 ,729 -----------------------+------------+------------ Pacific ,959 ,619 -----------------------+------------+------------ All Divisions ,291 ,555 ---------------------------------------------------
Example 10: Reporting on Multiple-Response Survey Data
Procedure features:
-
TABLE statement:
-
denominator definition (angle bracket operators)
-
N statistic
-
PCTN statistic
-
variable list
-
Other features:
-
FORMAT procedure
-
SAS system options:
-
FORMDLIM=
-
NONUMBER
-
-
SYMPUT routine
The two tables in this example show
-
which factors most influenced customers decisions to buy products
-
where customers heard of the company.
The reports appear on one physical page with only one page number. By default, they would appear on separate pages.
In addition to showing how to create these tables, this example shows how to
-
use a DATA step to count the number of observations in a data set
-
store that value in a macro variable
-
access that value later in the SAS session.
Collecting the Data
Figure 48.15 on page 1270 shows the survey form that is used to collect data.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page. The FORMDLIM= option replaces the character that delimits page breaks with a single blank. By default, a new physical page starts whenever a page break occurs.
options nodate pageno=1 linesize=80 pagesize=18 formdlim=' ';
Create the CUSTOMER_RESPONSE data set. CUSTOMER_RESPONSE contains data from a customer survey. Each observation in the data set contains information about factors that influence one respondent s decisions to buy products. A DATA step on page 1394 creates the data set. Using missing values rather than 0 s is crucial for calculating frequency counts in PROC TABULATE.
data customer_response; input Customer Factor1-Factor4 Source1-Source3 Quality1-Quality3; datalines; 1 . . 1 1 1 1 . 1 . . 2 1 1 . 1 1 1 . 1 1 . 3 . . 1 1 1 1 . . . . . . . more data lines . . . 119 . . . 1 . . . 1 . . 120 1 1 . 1 . . . . 1 . ;
Store the number of observations in a macro variable. The SET statement reads the descriptor portion of CUSTOMER_RESPONSE at compile time and stores the number of observations (the number of respondents) in COUNT. The SYMPUT routine stores the value of COUNT in the macro variable NUM. This variable is available for use by other procedures and DATA steps for the remainder of the SAS session. The IF 0 condition, which is always false, ensures that the SET statement, which reads the observations, never executes. (Reading observations is unnecessary.) The STOP statement ensures that the DATA step executes only once.
data _null_; if 0 then set customer_response nobs=count; call symput('num',left(put(count,4.))); stop; run;
Create the PCTFMT. format. The FORMAT procedure creates a format for percentages. The PCTFMT. format writes all values with at least one digit to the left of the decimal point and with one digit to the right of the decimal point. A blank and a percent sign follow the digits.
proc format; picture pctfmt low-high='009.9 %'; run;
Create the report and use the default table options.
proc tabulate data=customer_response;
Specify the analysis variables. The VAR statement specifies that PROC TABULATE calculate statistics on the Factor1, Factor2, Factor3, Factor4, and Customer variables. The variable Customer must be listed because it is used to calculate the Percent column that is defined in the TABLE statement.
var factor1-factor4 customer;
Define the table rows and columns. The TABLE statement creates a row for each factor, a column for frequency counts, and a column for the percentages. Text in quotation marks supplies headers for the corresponding row or column. The format modifiers F=7. and F=PCTFMT9. provide formats for values in the associated cells and extend the column widths to accommodate the column headers.
table factor1='Cost' factor2='Performance' factor3='Reliability' factor4='Sales Staff', (n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ;
Specify the titles.
title 'Customer Survey Results: Spring 1996'; title3 'Factors Influencing the Decision to Buy'; run;
Suppress page numbers . The SAS system option NONUMBER suppresses page numbers for subsequent pages.
options nonumber;
Create the report and use the default table options.
proc tabulate data=customer_response;
Specify the analysis variables. The VAR statement specifies that PROC TABULATE calculate statistics on the Source1, Source2, Source3, and Customer variables. The variable Customer must be in the variable list because it appears in the denominator definition.
var source1-source3 customer;
Define the table rows and columns. The TABLE statement creates a row for each source of the company name, a column for frequency counts, and a column for the percentages. Text in quotation marks supplies a heading for the corresponding row or column.
table source1='TV/Radio' source2='Newspaper' source3='Word of Mouth', (n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ;
Specify the title and footnote. The macro variable NUM resolves to the number of respondents. The FOOTNOTE statement uses double rather than single quotation marks so that the macro variable will resolve.
title 'Source of Company Name'; footnote "Number of Respondents: &num"; run;
Reset the SAS system options. The FORMDLIM= option resets the page delimiter to a page eject. The NUMBER option resumes the display of page numbers on subsequent pages.
options formdlim='' number;
Output
Customer Survey Results: Spring 1996 1 Factors Influencing the Decision to Buy -------------------------------------- Count Percent ------------------+-------+--------- Cost 87 72.5 % ------------------+-------+--------- Performance 62 51.6 % ------------------+-------+--------- Reliability 30 25.0 % ------------------+-------+--------- Sales Staff 120 100.0 % -------------------------------------- Source of Company Name -------------------------------------- Count Percent ------------------+-------+--------- TV/Radio 92 76.6 % ------------------+-------+--------- Newspaper 69 57.5 % ------------------+-------+--------- Word of Mouth 26 21.6 % -------------------------------------- Number of Respondents: 120
Example 11: Reporting on Multiple-Choice Survey Data
Procedure features:
-
TABLE statement:
-
N statistic
-
Other features:
-
FORMAT procedure
-
TRANSPOSE procedure
-
Data set options:
-
RENAME=
-
This report of listener preferences shows how many listeners select each type of programming during each of seven time periods on a typical weekday. The data was collected by a survey, and the results were stored in a SAS data set. Although this data set contains all the information needed for this report, the information is not arranged in a way that PROC TABULATE can use.
To make this crosstabulation of time of day and choice of radio programming, you must have a data set that contains a variable for time of day and a variable for programming preference. PROC TRANSPOSE reshapes the data into a new data set that contains these variables. Once the data are in the appropriate form, PROC TABULATE creates the report.
Collecting the Data
Figure 48.16 on page 1275 shows the survey form that is used to collect data.
An external file on page 1419 contains the raw data for the survey. Several lines from that file appear here.
967 32 f 5 3 5 7 5 5 5 7 0 0 0 8 7 0 0 8 0 781 30 f 2 3 5 5 0 0 0 5 0 0 0 4 7 5 0 0 0
859 39 f 1 0 5 1 0 0 0 1 0 0 0 0 0 0 0 0 0 . . . more data lines . . . 859 32 m .25 .25 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=132 pagesize=40;
Create the RADIO data set and specify the input file. RADIO contains data from a survey of 336 listeners. The data set contains information about listeners and their preferences in radio programming. The INFILE statement specifies the external file that contains the data. MISSOVER prevents the input pointer from going to the next record if it fails to find values in the current line for all variables that are listed in the INPUT statement.
data radio; infile ' input-file ' missover;
Read the appropriate data line, assign a unique number to each respondent, and write an observation to RADIO. Each raw-data record contains two lines of information about each listener. The INPUT statement reads only the information that this example needs. The / line control skips the first line of information in each record. The rest of the INPUT statement reads Time1-Time7 from the beginning of the second line. These variables represent the listener s radio programming preference for each of seven time periods on weekdays (see Figure 48.16 on page 1275). The listener=_N_ statement assigns a unique identifier to each listener. An observation is automatically written to RADIO at the end of each iteration.
input /(Time1-Time7) (. +1); listener=_n_; run;
Create the $TIMEFMT. and $PGMFMT. formats. PROC FORMAT creates formats for the time of day and the choice of programming.
proc format; value $timefmt 'Time1='6-9 a.m.' 'Time2'='9 a.m. to noon' 'Time3'='noon to 1 p.m.' 'Time4'='1-4 p.m.' 'Time5'='4-6 p.m.' 'Time6'='6-10 p.m.' 'Time7'='10 p.m. to 2 a.m.' other='*** Data Entry Error ***'; value $pgmfmt '0'="Don't Listen" '1','2'='Rock and Top 40' '3'='Country' '4','5','6'='Jazz, Classical, and Easy Listening' '7'='News/ Information /Talk' '8'='Other' other='*** Data Entry Error ***'; run;
Reshape the data by transposing the RADIO data set. PROC TRANSPOSE creates RADIO_TRANSPOSED. This data set contains the variable Listener from the original data set. It also contains two transposed variables: Timespan and Choice. Timespan contains the names of the variables (Time1-Time7) from the input data set that are transposed to form observations in the output data set. Choice contains the values of these variables. (See A Closer Look on page 1278 for a complete explanation of the PROC TRANSPOSE step.)
proc transpose data=radio out=radio_transposed(rename=(col1=Choice)) name=Timespan; by listener; var time1-time7;
Format the transposed variables. The FORMAT statement permanently associates these formats with the variables in the output data set.
format timespan $timefmt. choice $pgmfmt.; run;
Create the report and specify the table options. The FORMAT= option specifies the default format for the values in each table cell.
proc tabulate data=radio_transposed format=12.;
Specify subgroups for the analysis. The CLASS statement identifies Timespan and Choice as class variables.
class timespan choice;
Define the table rows and columns. The TABLE statement creates a row for each formatted value of Timespan and a column for each formatted value of Choice. In each column are values for the N statistic. Text in quotation marks supplies headings for the corresponding rows or columns.
table timespan='Time of Day', choice='Choice of Radio Program'*n='Number of Listeners';
Specify the title.
title 'Listening Preferences on Weekdays'; run;
Output
Listening Preferences on Weekdays 1 --------------------------------------------------------------------------------------------------------------- Choice of Radio Program ----------------------------------------------------------------------------- Jazz, Classical, News/ Rock and Top and Easy Information Don't Listen 40 Country Listening /Talk Other ------------+------------+------------+------------+------------+------------ Number of Number of Number of Number of Number of Number of Listeners Listeners Listeners Listeners Listeners Listeners -------------------------------+------------+------------+------------+------------+------------+------------ Time of Day ------------------------------- 6-9 a.m. 34 143 7 39 96 17 -------------------------------+------------+------------+------------+------------+------------+------------ 9 a.m. to noon 214 59 5 51 3 4 -------------------------------+------------+------------+------------+------------+------------+------------ noon to 1 p.m. 238 55 3 27 9 4 -------------------------------+------------+------------+------------+------------+------------+------------ 1-4 p.m. 216 60 5 50 2 3 -------------------------------+------------+------------+------------+------------+------------+------------ 4-6 p.m. 56 130 6 57 69 18 -------------------------------+------------+------------+------------+------------+------------+------------ 6-10 p.m. 202 54 9 44 20 7 -------------------------------+------------+------------+------------+------------+------------+------------ 10 p.m. to 2 a.m. 264 29 3 36 2 2 ---------------------------------------------------------------------------------------------------------------
A Closer Look
Reshape the data
The original input data set has all the information that you need to make the crosstabular report, but PROC TABULATE cannot use the information in that form. PROC TRANSPOSE rearranges the data so that each observation in the new data set contains the variable Listener, a variable for time of day, and a variable for programming preference. Figure 48.17 on page 1279 illustrates the transposition. PROC TABULATE uses this new data set to create the crosstabular report.
PROC TRANSPOSE restructures data so that values that were stored in one observation are written to one variable. You can specify which variables you want to transpose. This section illustrates how PROC TRANSPOSE reshapes the data. The following section explains the PROC TRANSPOSE step in this example.
When you transpose with BY processing, as this example does, you create from each BY group one observation for each variable that you transpose. In this example, Listener is the BY variable. Each observation in the input data set is a BY group because the value of Listener is unique for each observation.
This example transposes seven variables, Time1 through Time7. Therefore, the output data set has seven observations from each BY group (each observation) in the input data set.
Understanding the PROC TRANSPOSE Step
Here is a detailed explanation of the PROC TRANSPOSE step that reshapes the data:
proc transpose data=radio [1] out=radio_transposed(rename=(col1=Choice)) [2] name=Timespan; [3] by listener; [4] var time1-time7; [5] format timespan $timefmt. choice $pgmfmt.; [6] run;
[1] | The DATA= option specifies the input data set. |
[2] | The OUT= option specifies the output data set. The RENAME= data set option renames the transposed variable from COL1 (the default name) to Choice. |
[3] | The NAME= option specifies the name for the variable in the output data set that contains the name of the variable that is being transposed to create the current observation. By default, the name of this variable is _NAME_. |
[4] | The BY statement identifies Listener as the BY variable. |
[5] | The VAR statement identifies Time1 through Time7 as the variables to transpose. |
[6] | The FORMAT statement assigns formats to Timespan and Choice. The PROC TABULATE step that creates the report does not need to format Timespan and Choice because the formats are stored with these variables. |
Example 12: Calculating Various Percentage Statistics
Procedure features:
-
PROC TABULATE statement options:
-
FORMAT=
-
-
TABLE statement:
-
-
ALL class variable
-
COLPCTSUM statistic
-
concatenation (blank) operator
-
crossing (*) operator
-
format modifiers
-
grouping elements (parentheses) operator
-
labels
-
REPPCTSUM statistic
-
ROWPCTSUM statistic
-
variable list
-
-
-
TABLE statement options:
-
-
ROW=FLOAT
-
RTS=
-
-
Other features: FORMAT procedure
This example shows how to use three percentage sum statistics: COLPCTSUM, REPPCTSUM, and ROWPCTSUM.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=105 pagesize=60;
Create the FUNDRAIS data set. FUNDRAIS contains data on student sales during a school fund-raiser. A DATA step creates the data set.
data fundrais; length name $ 8 classrm $ 1; input @1 team $ @8 classrm $ @10 name $ @19 pencils @23 tablets; sales=pencils + tablets; datalines; BLUE A ANN 4 8 RED A MARY 5 10 GREEN A JOHN 6 4 RED A BOB 2 3 BLUE B FRED 6 8 GREEN B LOUISE 12 2 BLUE B ANNETTE . 9 RED B HENRY 8 10 GREEN A ANDREW 3 5 RED A SAMUEL 12 10 BLUE A LINDA 7 12 GREEN A SARA 4 . BLUE B MARTIN 9 13 RED B MATTHEW 7 6 GREEN B BETH 15 10 RED B LAURA 4 3 ;
Create the PCTFMT. format. The FORMAT procedure creates a format for percentages. The PCTFMT. format writes all values with at least one digit, a blank, and a percent sign.
proc format; picture pctfmt low-high='009 %'; run;
Specify the title.
title "Fundraiser Sales";
Create the report and specify the table options. The FORMAT= option specifies up to seven digits as the default format for the value in each table cell.
proc tabulate format=7.;
Specify subgroups for the analysis. The CLASS statement identifies Team and Classrm as class variables.
class team classrm;
Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Sales variable.
var sales;
Define the table rows. The row dimension of the TABLE statement creates a row for each formatted value of Team. The last row of the report summarizes sales for all teams .
table (team all),
Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Classrm. Crossed within each value of Classrm is the analysis variable ( sales ) with a blank label. Nested within each column are columns that summarize sales for the class.
-
The first nested column, labeled sum , is the sum of sales for the row for the classroom.
-
The second nested column, labeled ColPctSum , is the percentage of the sum of sales for the row for the classroom in relation to the sum of sales for all teams in the classroom.
-
The third nested column, labeled RowPctSum , is the percentage of the sum of sales for the row for the classroom in relation to the sum of sales for the row for all classrooms.
-
The fourth nested column, labeled RepPctSum , is the percentage of the sum of sales for the row for the classroom in relation to the sum of sales for all teams for all classrooms.
The last column of the report summarizes sales for the row for all classrooms.
classrm='Classroom'*sales=' '*(sum colpctsum*f=pctfmt9. rowpctsum*f=pctfmt9. reppctsum*f=pctfmt9.) all*sales*sum=' '
Specify the row title space and eliminate blank row headings. RTS= provides 20 characters per line for row headings.
/rts=20; run;
Output
Fundraiser Sales 1 -------------------------------------------------------------------------------------------------------- Classroom --------------------------------------------------------------------------- A B All -------------------------------------+-------------------------------------+------- Sum ColPctSumRowPctSumRepPctSum Sum ColPctSumRowPctSumRepPctSum Sum ------------------+-------+---------+---------+---------+-------+---------+---------+---------+------- team ------------------ BLUE 31 34 % 46 % 15 % 36 31 % 53 % 17 % 67 ------------------+-------+---------+---------+---------+-------+---------+---------+---------+------- GREEN 18 19 % 31 % 8 % 39 34 % 68 % 19 % 57 ------------------+-------+---------+---------+---------+-------+---------+---------+---------+------- RED 42 46 % 52 % 20 % 38 33 % 47 % 18 % 80 ------------------+-------+---------+---------+---------+-------+---------+---------+---------+------- All 91 100 % 44 % 44 % 113 100 % 55 % 55 % 204 --------------------------------------------------------------------------------------------------------
A Closer Look
Here are the percentage sum statistic calculations used to produce the output for the Blue Team in Classroom A:
-
COLPCTSUM=31/91*100=34%
-
ROWPCTSUM=31/67*100=46%
-
REPPCTSUM=31/204*100=15%
Similar calculations were used to produce the output for the remaining teams and classrooms.
Example 13: Using Denominator Definitions to Display Basic Frequency Counts and Percentages
Procedure features:
-
TABLE statement:
-
ALL class variable
-
denominator definitions (angle bracket operators)
-
N statistic
-
PCTN statistic
-
Other features:
-
FORMAT procedure
Crosstabulation tables (also called contingency tables and stub-and-banner reports ) show combined frequency distributions for two or more variables. This table shows frequency counts for females and males within each of four job classes. The table also shows the percentage that each frequency count represents of
-
the total women and men in that job class (row percentage)
-
the total for that gender in all job classes (column percentage)
-
the total for all employees .
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the JOBCLASS data set. JOBCLASS contains encoded information about the gender and job class of employees at a fictitious company.
data jobclass; input Gender Occupation @@; datalines; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 3 1 4 1 4 1 4 1 4 1 4 1 4 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 3 1 3 1 3 1 4 1 4 1 4 1 4 1 4 1 1 1 3 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 3 2 3 2 3 2 4 2 4 2 4 2 4 2 4 2 4 2 1 2 3 2 3 2 3 2 3 2 3 2 4 2 4 2 4 2 4 2 4 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 2 4 2 4 2 4 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 3 2 3 2 3 2 3 2 4 ;
Create the GENDFMT. and OCCUPFMT. formats. PROC FORMAT creates formats for the variables Gender and Occupation.
proc format; value gendfmt 1='Female' 2='Male' other='*** Data Entry Error ***'; value occupfmt 1='Technical' 2='Manager/Supervisor' 3='Clerical' 4='Administrative' other='*** Data Entry Error ***'; run;
Create the report and specify the table options. The FORMAT= option specifies the 8.2 format as the default format for the value in each table cell.
proc tabulate data=jobclass format=8.2;
Specify subgroups for the analysis. The CLASS statement identifies Gender and Occupation as class variables.
class gender occupation;
Define the table rows. The TABLE statement creates a set of rows for each formatted value of Occupation and for all jobs together. Text in quotation marks supplies a header for the corresponding row.
The asterisk in the row dimension indicates that the statistics that follow in parentheses are nested within the values of Occupation and All to form sets of rows. Each set of rows includes four statistics:
-
N, the frequency count. The format modifier (F=9.) writes the values of N without the decimal places that the default format would use. It also extends the column width to nine characters so that the word Employees fits on one line.
-
the percentage of the row total (row percent).
-
the percentage of the column total (column percent).
-
the overall percent. Text in quotation marks supplies the heading for the corresponding row. A comma separates the row definition from the column definition.
For detailed explanations of the structure of this table and of the use of denominator definitions, see A Closer Look on page 1286.
table (occupation='Job Class' all='All Jobs') *(n='Number of employees'*f=9. pctn<gender all>='Percent of row total' pctn<occupation all>='Percent of column total' pctn='Percent of total'),
Define the table columns and specify the amount of space for row headings. The column dimension creates a column for each formatted value of Gender and for all employees. Text in quotation marks supplies the heading for the corresponding column. The RTS= option provides 50 characters per line for row headings.
gender='Gender' all='All Employees'/ rts=50;
Format the output. The FORMAT statement assigns formats to the variables Gender and Occupation.
format gender gendfmt. occupation occupfmt.;
Specify the titles.
title 'Gender Distribution'; title2 'within Job Classes'; run;
Output
Gender Distribution 1 within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.00 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
A Closer Look
The part of the TABLE statement that defines the rows of the table uses the PCTN statistic to calculate three different percentages.
In all calculations of PCTN, the numerator is N, the frequency count for one cell of the table. The denominator for each occurrence of PCTN is determined by the denominator definition . The denominator definition appears in angle brackets after the keyword PCTN. It is a list of one or more expressions. The list tells PROC TABULATE which frequency counts to sum for the denominator.
Analyzing the Structure of the Table
Taking a close look at the structure of the table helps you understand how PROC TABULATE uses the denominator definitions. The following simplified version of the TABLE statement clarifies the basic structure of the table:
table occupation='Job Class' all='All Jobs', gender='Gender' all='All Employees';
The table is a concatenation of four subtables. In this report, each subtable is a crossing of one class variable in the row dimension and one class variable in the column dimension. Each crossing establishes one or more categories. A category is a combination of unique values of class variables, such as female, technical or all, clerical . Table 48.8 on page 1287 describes each subtable.
Class variables contributing to the subtable | Description of frequency counts | Number of categories |
---|---|---|
Occupation and Gender | number of females in each job or number of males in each job | 8 |
All and Gender | number of females or number of males | 2 |
Occupation and All | number of people in each job | 4 |
All and All | number of people in all jobs | 1 |
Figure 48.18 on page 1288 highlights these subtables and the frequency counts for each category.
Interpreting Denominator Definitions
The following fragment of the TABLE statement defines the denominator definitions for this report. The PCTN keyword and the denominator definitions are highlighted.
table (occupation='Job Class' all='All Jobs') *(n='Number of employees'*f=5. pctn<gender all> ='Row percent' pctn<occupation all> ='Column percent' pctn ='Percent of total'),
Each use of PCTN nests a row of statistics within each value of Occupation and All. Each denominator definition tells PROC TABULATE which frequency counts to sum for the denominators in that row. This section explains how PROC TABULATE interprets these denominator definitions.
Row Percentages
The part of the TABLE statement that calculates the row percentages and that labels the row is
pctn<gender all>='Row percent'
Consider how PROC TABULATE interprets this denominator definition for each subtable.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender within the same value of Occupation.
For example, the denominator for the category female, technical is the sum of all frequency counts for all categories in this subtable for which the value of Occupation is technical . There are two such categories: female, technical and male, technical . The corresponding frequency counts are 16 and 18. Therefore, the denominator for this category is 16+18, or 34.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender in the subtable.
For example, the denominator for the category all, female is the sum of the frequency counts for all, female and all, male . The corresponding frequency counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62, or 123.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
For example, the denominator for the category clerical, all is the frequency count for that category, 28.
Note: In these table cells, because the numerator and the denominator are the same, the row percentages in this subtable are all 100.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
There is only one category in this subtable: all, all . The denominator for this category is 123.
Note: In this table cell, because the numerator and denominator are the same, the row percentage in this subtable is 100.
Column Percentages
The part of the TABLE statement that calculates the column percentages and labels the row is
pctn<occupation all>='Column percent'
Consider how PROC TABULATE interprets this denominator definition for each subtable.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation within the same value of Gender.
For example, the denominator for the category manager/supervisor, male is the sum of all frequency counts for all categories in this subtable for which the value of Gender is male . There are four such categories: technical, male ; manager/ supervisor, male ; clerical, male ; and administrative, male . The corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator for this category is 18+15+14+15, or 62.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count for All as the denominator.
For example, the denominator for the category all, female is the frequency count for that category, 61.
Note: In these table cells, because the numerator and denominator are the same, the column percentages in this subtable are all 100.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation in the subtable.
For example, the denominator for the category technical, all is the sum of the frequency counts for technical, all ; manager/supervisor, all ; clerical, all ; and administrative, all . The corresponding frequency counts are 34, 35, 28, and 26. Therefore, the denominator for this category is 34+35+28+26, or 123.
|
Gender Distribution within Job Classes -------------------------------------------------------------------------------- Gender ------------------- All Female Male Employees ------------------------------------------------+---------+---------+--------- Job Class -----------------------+------------------------ Technical Number of employees 16 18 34 ------------------------+---------+---------+--------- Percent of row total 47.06 52.94 100.00 ------------------------+---------+---------+--------- Percent of column total 26.23 29.03 27.64 ------------------------+---------+---------+--------- Percent of total 13.01 14.63 27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor Number of employees 20 15 35 ------------------------+---------+---------+--------- Percent of row total 57.14 42.86 100.00 ------------------------+---------+---------+--------- Percent of column total 32.79 24.19 28.46 ------------------------+---------+---------+--------- Percent of total 16.26 12.20 28.46 -----------------------+------------------------+---------+---------+--------- Clerical Number of employees 14 14 28 ------------------------+---------+---------+--------- Percent of row total 50.00 50.50 100.00 ------------------------+---------+---------+--------- Percent of column total 22.95 22.58 22.76 ------------------------+---------+---------+--------- Percent of total 11.38 11.38 22.76 -----------------------+------------------------+---------+---------+--------- Administrative Number of employees 11 15 26 ------------------------+---------+---------+--------- Percent of row total 42.31 57.69 100.00 ------------------------+---------+---------+--------- Percent of column total 18.03 24.19 21.14 ------------------------+---------+---------+--------- Percent of total 8.94 12.20 21.14 -----------------------+------------------------+---------+---------+--------- All Jobs Number of employees 61 62 123 ------------------------+---------+---------+--------- Percent of row total 49.59 50.41 100.00 ------------------------+---------+---------+--------- Percent of column total 100.00 100.00 100.00 ------------------------+---------+---------+--------- Percent of total 49.59 50.41 100.00 --------------------------------------------------------------------------------
|
PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.
There is only one category in this subtable: all, all . The frequency count for this category is 123.
Note: In this calculation, because the numerator and denominator are the same, the column percentage in this subtable is 100.
Total Percentages
The part of the TABLE statement that calculates the total percentages and labels the row is
pctn='Total percent'
If you do not specify a denominator definition, then PROC TABULATE obtains the denominator for a cell by totaling all the frequency counts in the subtable. Table 48.9 on page 1293 summarizes the process for all subtables in this example.
Class variables contributing to the subtable | Frequency counts | Total |
---|---|---|
Occupat and Gender | 16, 18, 20, 15 14, 14, 11, 15 | 123 |
Occupat and All | 34, 35, 28, 26 | 123 |
Gender and All | 61, 62 | 123 |
All and All | 123 | 123 |
Consequently, the denominator for total percentages is always 123.
Example 14: Specifying Style Elements for ODS Output
Procedure features:
-
STYLE= option in
-
PROC TABULATE statement
-
CLASSLEV statement
-
KEYWORD statement
-
TABLE statement
-
VAR statement
-
Other features:
-
ODS HTML statement
-
ODS PDF statement
-
ODS RTF statement
Data set: ENERGY ENERGY on page 1401
Formats: REGFMT, DIVFMT, and USETYPE. on page 1247
This example creates HTML, RTF, and PDF files and specifies style elements for various table regions.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= and PAGESIZE= are not set for this example because they have no effect on HTML, RTF, and Printer output.
options nodate pageno=1;
Specify the ODS output filenames. By opening multiple ODS destinations, you can produce multiple output files in a single execution. The ODS HTML statement produces output that is written in HTML. The ODS PDF statement produces output in Portable Document Format (PDF). The ODS RTF statement produces output in Rich Text Format (RTF). The output from PROC TABULATE goes to each of these files.
ods html body=' external-HTML-file '; ods pdf file=' external-PDF-file '; ods rtf file=' external-RTF-file ';
Specify the table options. The STYLE= option in the PROC TABULATE statement specifies the style element for the data cells of the table.
proc tabulate data=energy style=[font_weight=bold];
Specify subgroups for the analysis. The STYLE= option in the CLASS statement specifies the style element for the class variable name headings.
class region division type / style=[just=center];
Specify the style attributes for the class variable value headings. The STYLE= option in the CLASSLEV statement specifies the style element for the class variable level value headings.
classlev region division type / style=[just=left];
Specify the analysis variable and its style attributes. The STYLE= option in the VAR statement specifies a style element for the variable name headings.
var expenditures / style=[font_size=3];
Specify the style attributes for keywords, and label the all keyword. The STYLE= option in the KEYWORD statement specifies a style element for keywords. The KEYLABEL statement assigns a label to the keyword.
keyword all sum / style=[font_width=wide]; keylabel all="Total";
Define the table rows and columns and their style attributes. The STYLE= option in the dimension expression overrides any other STYLE= specifications in PROC TABULATE that specify attributes for table cells. The STYLE= option after the slash (/) specifies attributes for parts of the table other than table cells.
table (region all)*(division all*[style=[background=yellow]]), (type all)*(expenditures*f=dollar10.) / style=[bordercolor=blue]
Specify the style attributes for cells with missing values. The STYLE= option in the MISSTEXT option of the TABLE statement specifies a style element to use for the text in table cells that contain missing values.
misstext=[label="Missing" style=[font_weight=light]]
Specify the style attributes for the box above the row titles. The STYLE= option in the BOX option of the TABLE statement specifies a style element to use for text in the box above the row titles.
box=[label="Region by Division by Type" style=[font_style=italic]];
Format the class variable values. The FORMAT statement assigns formats to Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
Specify the titles.
title 'Energy Expenditures'; title2 '(millions of dollars)'; run;
Close the ODS destinations.
ods html close; ods pdf close; ods rtf close;
HTML Output
PDF Output
RTF Output