Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4

Example 1: Standardizing to a Given Mean and Standard Deviation

Procedure features:

Other features:

This example

Program

Set the SAS system options. The NODATE option specifies to omit the date and time when the SAS job began . The PAGENO= option specifies the page number for the next page of output that SAS produces. The LINESIZE= option specifies the line size . The PAGESIZE= option specifies the number of lines for a page of SAS output.

options nodate pageno=1 linesize=80 pagesize=60;

Create the SCORE data set. This data set contains test scores for students who took two tests and a final exam. The FORMAT statement assigns the Z w.d format to StudentNumber. This format pads right-justified output with 0s instead of blanks. The LENGTH statement specifies the number of bytes to use to store values of Student.

data score; length Student $ 9; input Student $ StudentNumber Section $ Test1 Test2 Final @@; format studentnumber z4.; datalines; Capalleti 0545 1 94 91 87 Dubose 1252 2 51 65 91 Engles 1167 1 95 97 97 Grant 1230 2 63 75 80 Krupski 2527 2 80 69 71 Lundsford 4860 1 92 40 86 McBane 0674 1 75 78 72 Mullen 6445 2 89 82 93 Nguyen 0886 1 79 76 80 Patel 9164 2 71 77 83 Si 4915 1 75 71 73 Tanaka 8534 2 87 73 76 ;

Generate the standardized data and create the output data set STNDTEST. PROC STANDARD uses a mean of 75 and a standard deviation of 5 to standardize the values. OUT= identifies STNDTEST as the data set to contain the standardized values.

proc standard data=score mean=75 std=5 out=stndtest;

Specify the variables to standardize. The VAR statement specifies the variables to standardize and their order in the output.

var test1 test2; run;

Create a data set that combines the original values with the standardized values. PROC SQL joins SCORE and STNDTEST to create the COMBINED data set (table) that contains standardized and original test scores for each student. Using AS to rename the standardized variables NEW.TEST1 to StdTest1 and NEW.TEST2 to StdTest2 makes the variable names unique.

proc sql; create table combined as

select old.student, old.studentnumber, old.section, old.test1, new.test1 as StdTest1, old.test2, new.test2 as StdTest2, old.final from score as old, stndtest as new where old.student=new.student;

Print the data set. PROC PRINT prints the COMBINED data set. ROUND rounds the standardized values to two decimal places. The TITLE statement specifies a title.

proc print data=combined noobs round; title 'Standardized Test Scores for a College Course'; run;

Output

The data set contains variables with both standardized and original values. StdTest1 and StdTest2 store the standardized test scores that PROC STANDARD computes.

Standardized Test Scores for a College Course 1 Student Std Std Student Number Section Test1 Test1 Test2 Test2 Final Capalleti 0545 1 94 80.54 91 80.86 87 Dubose 1252 2 51 64.39 65 71.63 91 Engles 1167 1 95 80.91 97 82.99 97 Grant 1230 2 63 68.90 75 75.18 80 Krupski 2527 2 80 75.28 69 73.05 71 Lundsford 4860 1 92 79.79 40 62.75 86 McBane 0674 1 75 73.40 78 76.24 72 Mullen 6445 2 89 78.66 82 77.66 93 Nguyen 0886 1 79 74.91 76 75.53 80 Patel 9164 2 71 71.90 77 75.89 83 Si 4915 1 75 73.40 71 73.76 73 Tanaka 8534 2 87 77.91 73 74.47 76

Example 2: Standardizing BY Groups and Replacing Missing Values

Procedure features:

Other features:

This example

Program

Set the SAS system options. The NODATE option specifies to omit the date and time when the SAS job began. The PAGENO= option specifies the page number for the next page of output that SAS produces. The LINESIZE= option specifies the line size. The PAGESIZE= option specifies the number of lines for a page of SAS output.

options nodate pageno=1 linesize=80 pagesize=60;

Assign a character string format to a numeric value. PROC FORMAT creates the format POPFMT to identify birth rates with a character value.

proc format; value popfmt 1='Stable' 2='Rapid'; run;

Create the LIFEEXP data set. Each observation in this data set contains information on 1950 and 1993 life expectancies at birth for 16 nations. [*] The birth rate for each nation is classified as stable (1) or rapid (2). The nations with missing data obtained independent status after 1950.

data lifexp; input PopulationRate Country $char14. Life50 Life93 @@; label life50='1950 life expectancy' life93='1993 life expectancy'; datalines; 2 Bangladesh . 53 2 Brazil 51 67 2 China 41 70 2 Egypt 42 60 2 Ethiopia 33 46 1 France 67 77

1 Germany 68 75 2 India 39 59 2 Indonesia 38 59 1 Japan 64 79 2 Mozambique . 47 2 Philippines 48 64 1 Russia . 65 2 Turkey 44 66 1 United Kingdom 69 76 1 United States 69 75 ;

Sort the LIFEEXP data set. PROC SORT sorts the observations by the birth rate.

proc sort data=lifexp; by populationrate; run;

Generate the standardized data for all numeric variables and create the output data set ZSCORE. PROC STANDARD standardizes all numeric variables to a mean of 1 and a standard deviation of 0. REPLACE replaces missing values. PRINT prints statistics.

proc standard data=lifexp mean=0 std=1 replace print out=zscore;

Create the standardized values for each BY group. The BY statement standardizes the values separately by birth rate.

by populationrate;

Assign a format to a variable and specify a title for the report. The FORMAT statement assigns a format to PopulationRate. The output data set contains formatted values. The TITLE statement specifies a title.

format populationrate popfmt.; title1 'Life Expectancies by Birth Rate'; run;

Print the data set. PROC PRINT prints the ZSCORE data set with the standardized values. The TITLE statements specify two titles to print.

proc print data=zscore noobs; title 'Standardized Life Expectancies at Birth'; title2 'by a Country''s Birth Rate'; run;

Output

PROC STANDARD prints the variable name , mean, standard deviation, input frequency, and label of each variable to standardize for each BY group.

Life expectancies for Bangladesh, Mozambique, and Russia are no longer missing. The missing values are replaced with the given mean (0).

Life Expectancies by Birth Rate 1 ---------------------------- PopulationRate=Stable ---------------------------- Standard Name Mean Deviation N Label Life50 67.400000 1.854724 5 1950 life expectancy Life93 74.500000 4.888763 6 1993 life expectancy ----------------------------- PopulationRate=Rapid ---------------------------- Standard Name Mean Deviation N Label Life50 42.000000 5.033223 8 1950 life expectancy Life93 59.100000 8.225300 10 1993 life expectancy Standardized Life Expectancies at Birth 2 by a Countrys Birth Rate Population Rate Country Life50 Life93 Stable France -0.21567 0.51138 Stable Germany 0.32350 0.10228 Stable Japan -1.83316 0.92048 Stable Russia 0.00000 -1.94323 Stable United Kingdom 0.86266 0.30683 Stable United States 0.86266 0.10228 Rapid Bangladesh 0.00000 -0.74161 Rapid Brazil 1.78812 0.96045 Rapid China -0.19868 1.32518 Rapid Egypt 0.00000 0.10942 Rapid Ethiopia -1.78812 -1.59265 Rapid India -0.59604 -0.01216 Rapid Indonesia -0.79472 -0.01216 Rapid Mozambique 0.00000 -1.47107 Rapid Philippines 1.19208 0.59572 Rapid Turkey 0.39736 0.83888

[*] Data are from Vital Signs 1994: The Trends That Are Shaping Our Future , Lester R. Brown, Hal Kane, and David Malin Roodman, eds. Copyright 1994 by Worldwatch Institute. Reprinted by permission of W.W. Norton & Company, Inc.

Категории