Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
Example 1: Performing a Simple Transposition
Procedure features:
PROC TRANSPOSE statement option:
-
OUT=
This example performs a default transposition and uses no subordinate statements.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the SCORE data set. set SCORE contains students names , their identification numbers , and their grades on two tests and a final exam.
data score; input Student . +1 StudentID $ Section $ Test1 Test2 Final; datalines; Capalleti 0545 1 94 91 87 Dubose 1252 2 51 65 91 Engles 1167 1 95 97 97 Grant 1230 2 63 75 80 Krupski 2527 2 80 76 71 Lundsford 4860 1 92 40 86 McBane 0674 1 75 78 72 ;
Transpose the data set. PROC TRANSPOSE transposes only the numeric variables, Test1, Test2, and Final, because no VAR statement appears and none of the numeric variables appear in another statement. OUT= puts the result of the transposition in the data set SCORE_TRANSPOSED.
proc transpose data=score out=score_transposed; run;
Print the SCORE_TRANSPOSED data set. The NOOBS option suppresses the printing of observation numbers.
proc print data=score_transposed noobs; title 'Student Test Scores in Variables'; run;
Output
In the output data set SCORE_TRANSPOSED, the variables COL1 through COL7 contain the individual scores for the students. Each observation contains all the scores for one test. The variable _NAME_ contains the names of the variables from the input data set that were transposed.
Student Test Scores in Variables 1 _NAME_ COL1 COL2 COL3 COL4 COL5 COL6 COL7 Test1 94 51 95 63 80 92 75 Test2 91 65 97 75 76 40 78 Final 87 91 97 80 71 86 72
Example 2: Naming Transposed Variables
Procedure features:
-
PROC TRANSPOSE statement options:
-
NAME =
-
PREFIX=
-
-
ID statement
Data set: SCORE on page 1336
This example uses the values of a variable and a user -supplied value to name transposed variables.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Transpose the data set. PROC TRANSPOSE transposes only the numeric variables, Test1, Test2, and Final, because no VAR statement appears. OUT= puts the result of the transposition in the IDNUMBER data set. NAME= specifies Test as the name for the variable that contains the names of the variables in the input data set that the procedure transposes. The procedure names the transposed variables by using the value from PREFIX=, sn, and the value of the ID variable StudentID.
proc transpose data=score out=idnumber name=Test prefix=sn; id studentid; run;
Print the IDNUMBER data set. The NOOBS option suppresses the printing of observation numbers.
proc print data=idnumber noobs; title 'Student Test Scores'; run;
Output
This is the output data set, IDNUMBER.
Student Test Scores 1 Test sn0545 sn1252 sn1167 sn1230 sn2527 sn4860 sn0674 Test1 94 51 95 63 80 92 75 Test2 91 65 97 75 76 40 78 Final 87 91 97 80 71 86 72
Example 3: Labeling Transposed Variables
Procedure features:
-
PROC TRANSPOSE statement option:
-
PREFIX=
-
-
IDLABEL statement
Data set: SCORE on page 1336
This example uses the values of the variable in the IDLABEL statement to label transposed variables.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Transpose the data set. PROC TRANSPOSE transposes only the numeric variables, Test1, Test2, and Final, because no VAR statement appears. OUT= puts the result of the transposition in the IDLABEL data set. NAME= specifies Test as the name for the variable that contains the names of the variables in the input data set that the procedure transposes. The procedure names the transposed variables by using the value from PREFIX=, sn, and the value of the ID variable StudentID.
proc transpose data=score out=idlabel name=Test prefix=sn; id studentid;
Assign labels to the output variables. PROC TRANSPOSE uses the values of the variable Student to label the transposed variables. The procedure provides the following as the label for the _NAME_ variable: NAME OF FORMER VARIABLE
idlabel student; run;
Print the IDLABEL data set. The LABEL option causes PROC PRINT to print variable labels for column headers. The NOOBS option suppresses the printing of observation numbers.
proc print data=idlabel label noobs; title 'Student Test Scores'; run;
Output
This is the output data set, IDLABEL.
Student Test Scores 1 NAME OF FORMER VARIABLE Capalleti Dubose Engles Grant Krupski Lundsford McBane Test1 94 51 95 63 80 92 75 Test2 91 65 97 75 76 40 78 Final 87 91 97 80 71 86 72
Example 4: Transposing BY Groups
Procedure features:
-
BY statement
-
VAR statement
Other features: Data set option:
-
RENAME=
This example illustrates transposing BY groups and selecting variables to transpose.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the FISHDATA data set. The data in FISHDATA represents length and weight measurements of fish that were caught at two ponds on three separate days. The raw data is sorted by Location and Date.
data fishdata; infile datalines missover; input Location & . Date date7. Length1 Weight1 Length2 Weight2 Length3 Weight3 Length4 Weight4; format date date7.; datalines; Cole Pond 2JUN95 31 .25 32 .3 32 .25 33 .3 Cole Pond 3JUL95 33 .32 34 .41 37 .48 32 .28 Cole Pond 4AUG95 29 .23 30 .25 34 .47 32 .3 Eagle Lake 2JUN95 32 .35 32 .25 33 .30 Eagle Lake 3JUL95 30 .20 36 .45 Eagle Lake 4AUG95 33 .30 33 .28 34 .42 ;
Transpose the data set. OUT= puts the result of the transposition in the FISHLENGTH data set. RENAME= renames COL1 in the output data set to Measurement.
proc transpose data=fishdata out=fishlength(rename=(col1=Measurement));
Specify the variables to transpose. The VAR statement limits the variables that PROC TRANSPOSE transposes.
var length1-length4;
Organize the output data set into BY groups. The BY statement creates BY groups for each unique combination of values of Location and Date. The procedure does not transpose the BY variables.
by location date; run;
Print the FISHLENGTH data set. The NOOBS option suppresses the printing of observation numbers.
proc print data=fishlength noobs; title 'Fish Length Data for Each Location and Date'; run;
Output
This is the output data set, FISHLENGTH. For each BY group in the original data set, PROC TRANSPOSE creates four observations, one for each variable that it is transposing. Missing values appear for the variable Measurement ( renamed from COL1) when the variables that are being transposed have no value in the input data set for that BY group. Several observations have a missing value for Measurement. For example, in the last observation, a missing value appears because the input data contained no value for Length4 on 04AUG95 at Eagle Lake.
Fish Length Data for Each Location and Date 1 Location Date _NAME_ Measurement Cole Pond 02JUN95 Length1 31 Cole Pond 02JUN95 Length2 32 Cole Pond 02JUN95 Length3 32 Cole Pond 02JUN95 Length4 33 Cole Pond 03JUL95 Length1 33 Cole Pond 03JUL95 Length2 34 Cole Pond 03JUL95 Length3 37 Cole Pond 03JUL95 Length4 32 Cole Pond 04AUG95 Length1 29 Cole Pond 04AUG95 Length2 30 Cole Pond 04AUG95 Length3 34 Cole Pond 04AUG95 Length4 32 Eagle Lake 02JUN95 Length1 32 Eagle Lake 02JUN95 Length2 32 Eagle Lake 02JUN95 Length3 33 Eagle Lake 02JUN95 Length4 . Eagle Lake 03JUL95 Length1 30 Eagle Lake 03JUL95 Length2 36 Eagle Lake 03JUL95 Length3 . Eagle Lake 03JUL95 Length4 . Eagle Lake 04AUG95 Length1 33 Eagle Lake 04AUG95 Length2 33 Eagle Lake 04AUG95 Length3 34 Eagle Lake 04AUG95 Length4 .
Example 5: Naming Transposed Variables When the ID Variable Has Duplicate Values
Procedure features:
-
PROC TRANSPOSE statement option:
-
LET
-
This example shows how to use values of a variable (ID) to name transposed variables even when the ID variable has duplicate values.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=40;
Create the STOCKS data set. STOCKS contains stock prices for two competing kite manufacturers. The prices are recorded for two days, three times a day: at opening, at noon, and at closing. Notice that the input data set contains duplicate values for the Date variable.
data stocks; input Company . Date $ Time $ Price; datalines; Horizon Kites jun11 opening 29 Horizon Kites jun11 noon 27 Horizon Kites jun11 closing 27 Horizon Kites jun12 opening 27 Horizon Kites jun12 noon 28 Horizon Kites jun12 closing 30 SkyHi Kites jun11 opening 43 SkyHi Kites jun11 noon 43 SkyHi Kites jun11 closing 44 SkyHi Kites jun12 opening 44 SkyHi Kites jun12 noon 45 SkyHi Kites jun12 closing 45 ;
Transpose the data set. LET transposes only the last observation for each BY group. PROC TRANSPOSE transposes only the Price variable. OUT= puts the result of the transposition in the CLOSE data set.
proc transpose data=stocks out=close let;
Organize the output data set into BY groups. The BY statement creates two BY groups, one for each company.
by company;
Name the transposed variables. The values of Date are used as names for the transposed variables.
id date; run;
Print the CLOSE data set. The NOOBS option suppresses the printing of observation numbers..
proc print data=close noobs; title 'Closing Prices for Horizon Kites and SkyHi Kites'; run;
Output
This is the output data set, CLOSE.
Closing Prices for Horizon Kites and SkyHi Kites Company _NAME_ jun11 jun12 Horizon Kites Price 27 30 SkyHi Kites Price 44 45
Example 6: Transposing Data for Statistical Analysis
Procedure features:
-
COPY statement
VAR statement
This example arranges data to make it suitable for either a multivariate or a univariate repeated-measures analysis.
The data is from Chapter 8, Repeated-Measures Analysis of Variance, in SAS System for Linear Models, Third Edition.
Program 1
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the WEIGHTS data set. The data in WEIGHTS represents the results of an exercise therapy study of three weight- lifting programs: CONT is a control group, RI is a program in which the number of repetitions is increased, and WI is a program in which the weight is increased.
data weights; input Program $ s1-s7; datalines; CONT 85 85 86 85 87 86 87 CONT 80 79 79 78 78 79 78 CONT 78 77 77 77 76 76 77 CONT 84 84 85 84 83 84 85 CONT 80 81 80 80 79 79 80 RI 79 79 79 80 80 78 80 RI 83 83 85 85 86 87 87 RI 81 83 82 82 83 83 82 RI 81 81 81 82 82 83 81 RI 80 81 82 82 82 84 86 WI 84 85 84 83 83 83 84 WI 74 75 75 76 75 76 76 WI 83 84 82 81 83 83 82 WI 86 87 87 87 87 87 86 WI 82 83 84 85 84 85 86 ;
Create the SPLIT data set. This DATA step rearranges WEIGHTS to create the data set SPLIT. The DATA step transposes the strength values and creates two new variables: Time and Subject. SPLIT contains one observation for each repeated measure. SPLIT can be used in a PROC GLM step for a univariate repeated-measures analysis.
data split; set weights; array s{7} s1-s7; Subject + 1; do Time=1 to 7; Strength=s{time}; output; end; drop s1-s7; run;
Print the SPLIT data set. The NOOBS options suppresses the printing of observation numbers. The OBS= data set option limits the printing to the first 15 observations. SPLIT has 105 observations.
proc print data=split(obs=15) noobs; title 'SPLIT Data Set'; title2 'First 15 Observations Only'; run;
Output 1
SPLIT Data Set 1 First 15 Observations Only Program Subject Time Strength CONT 1 1 85 CONT 1 2 85 CONT 1 3 86 CONT 1 4 85 CONT 1 5 87 CONT 1 6 86 CONT 1 7 87 CONT 2 1 80 CONT 2 2 79 CONT 2 3 79 CONT 2 4 78 CONT 2 5 78 CONT 2 6 79 CONT 2 7 78 CONT 3 1 78
Program 2
Set the SAS system options.
options nodate pageno=1 linesize=80 pagesize=40;
Transpose the SPLIT data set. PROC TRANSPOSE transposes SPLIT to create TOTSPLIT. The TOTSPLIT data set contains the same variables as SPLIT and a variable for each strength measurement (Str1-Str7). TOTSPLIT can be used for either a multivariate repeated-measures analysis or a univariate repeated-measures analysis.
proc transpose data=split out=totsplit prefix=Str;
Organize the output data set into BY groups, and populate each BY group with untransposed values. The variables in the BY and COPY statements are not transposed. TOTSPLIT contains the variables Program, Subject, Time, and Strength with the same values that are in SPLIT. The BY statement creates the first observation in each BY group, which contains the transposed values of Strength. The COPY statement creates the other observations in each BY group by copying the values of Time and Strength without transposing them.
by program subject; copy time strength;
Specify the variable to transpose. The VAR statement specifies the Strength variable as the only variable to be transposed.
var strength; run;
Print the TOTSPLIT data set. The NOOBS options suppresses the printing of observation numbers. The OBS= data set option limits the printing to the first 15 observations. SPLIT has 105 observations.
proc print data=totsplit(obs=15) noobs; title 'TOTSPLIT Data Set'; title2 'First 15 Observations Only'; run;
Output 2
The variables in TOTSPLIT with missing values are used only in a multivariate repeated-measures analysis. The missing values do not preclude this data set from being used in a repeated-measures analysis because the MODEL statement in PROC GLM ignores observations with missing values.
TOTSPLIT Data Set 1 First 15 Observations Only Program Subject Time Strength _NAME_ Str1 Str2 Str3 Str4 Str5 Str6 Str7 CONT 1 1 85 Strength 85 85 86 85 87 86 87 CONT 1 2 85 . . . . . . . CONT 1 3 86 . . . . . . . CONT 1 4 85 . . . . . . . CONT 1 5 87 . . . . . . . CONT 1 6 86 . . . . . . . CONT 1 7 87 . . . . . . . CONT 2 1 80 Strength 80 79 79 78 78 79 78 CONT 2 2 79 . . . . . . . CONT 2 3 79 . . . . . . . CONT 2 4 78 . . . . . . . CONT 2 5 78 . . . . . . . CONT 2 6 79 . . . . . . . CONT 2 7 78 . . . . . . . CONT 3 1 78 Strength 78 77 77 77 76 76 77