SAS 9.1.3 Language Reference: Concepts, Third Edition, Volumes 1 and 2
Formats
Definition of a Format
A format is an instruction that SAS uses to write data values. You use formats to control the written appearance of data values, or, in some cases, to group data values together for analysis. For example, the WORDS22. format, which converts numeric values to their equivalent in words, writes the numeric value 692 as six hundred ninety-two .
Syntax of a Format
SAS formats have the following form:
<$> format < w >.< d >
Here is an explanation of the syntax:
$
indicates a character format; its absence indicates a numeric format.
format
names the format. The format is a SAS format or a user-defined format that was previously defined with the VALUE statement in PROC FORMAT. For more information on user -defined formats, see "The FORMAT Procedure" in Base SAS Procedures Guide .
w
specifies the format width, which for most formats is the number of columns in the output data.
d
specifies an optional decimal scaling factor in the numeric formats.
Formats always contain a period (.) as a part of the name . If you omit the w and the d values from the format, SAS uses default values. The d value that you specify with a format tells SAS to display that many decimal places, regardless of how many decimal places are in the data. Formats never change or truncate the internally stored data values.
For example, in DOLLAR10.2, the w value of 10 specifies a maximum of 10 columns for the value. The d value of 2 specifies that two of these columns are for the decimal part of the value, which leaves eight columns for all the remaining characters in the value. This includes the decimal point, the remaining numeric value, a minus sign if the value is negative, the dollar sign, and commas, if any.
If the format width is too narrow to represent a value, then SAS tries to squeeze the value into the space available. Character formats truncate values on the right. Numeric formats sometimes revert to the BEST w.d format. SAS prints asterisks if you do not specify an adequate width. In the following example, the result is x=**.
x=123; put x=2.;
If you use an incompatible format, such as using a numeric format to write character values, first SAS attempts to use an analogous format of the other type. If this is not feasible , then an error message that describes the problem appears in the SAS log.
Ways to Specify Formats
You can use formats in the following ways:
-
in a PUT statement
The PUT statement with a format after the variable name uses a format to write data values in a DATA step. For example, this PUT statement uses the DOLLAR. format to write the numeric value for AMOUNT as a dollar amount:
amount=1145.32; put amount dollar10.2;
The DOLLAR w . d format in the PUT statement produces this result:
,145.32
See the "PUT Statement" in SAS Language Reference: Dictionary for more information.
-
with the PUT, PUTC, or PUTN functions
The PUT function writes a numeric variable, a character variable, or a constant with any valid format and returns the resulting character value. For example, the following statement converts the value of a numeric variable into a two-character hexadecimal representation:
num=15; char=put(num,hex2.);
The PUT function creates a character variable named CHAR that has a value of 0F.
The PUT function is useful for converting a numeric value to a character value. See the "PUT Function" in SAS Language Reference: Dictionary for more information.
-
with the % SYSFUNC macro function
The %SYSFUNC (or %QSYSFUNC) macro function executes SAS functions or user-defined functions and applies an optional format to the result of the function outside a DATA step. For example, the following program writes a numeric value in a macro variable as a dollar amount.
%macro tst(amount); %put %sysfunc(putn(&amount,dollar10.2)); %mend tst; %tst (1154.23);
For more information, see SAS Macro Language: Reference .
-
in a FORMAT statement in a DATA step or a PROC step
The FORMAT statement permanently associates a format with a variable. SAS uses the format to write the values of the variable that you specify. For example, the following statement in a DATA step associates the COMMA w . d numeric format with the variables SALES1 through SALES3:
format sales1-sales3 comma10.2;
Because the FORMAT statement permanently associates a format with a variable, any subsequent DATA step or PROC step uses COMMA10.2 to write the values of SALES1, SALES2, and SALES3. See the "FORMAT Statement" in SAS Language Reference: Dictionary for more information.
Note Formats that you specify in a PUT statement behave differently from those that you associate with a variable in a FORMAT statement. The major difference is that formats that are specified in the PUT statement will preserve leading blanks. If you assign formats with a FORMAT statement prior to a PUT statement, all leading blanks are trimmed . The result is the same as if you used the colon (:) format modifier. For details about using the colon (:) format modifier, see the "PUT Statement, List" in SAS Language Reference: Dictionary .
-
in an ATTRIB statement in a DATA step or a PROC step.
The ATTRIB statement can also associate a format, as well as other attributes, with one or more variables. For example, in the following statement the ATTRIB statement permanently associates the COMMA w . d format with the variables SALES1 through SALES3:
attrib sales1-sales3 format=comma10.2;
Because the ATTRIB statement permanently associates a format with a variable, any subsequent DATA step or PROC step uses COMMA10.2 to write the values of SALES1, SALES2, and SALES3. For more information, see the "ATTRIB Statement" in SAS Language Reference: Dictionary .
Permanent versus Temporary Association
When you specify a format in a PUT statement, SAS uses the format to write data values during the DATA step but does not permanently associate the format with a variable. To permanently associate a format with a variable, use a FORMAT statement or an ATTRIB statement in a DATA step. SAS permanently associates a format with the variable by modifying the descriptor information in the SAS data set.
Using a FORMAT statement or an ATTRIB statement in a PROC step associates a format with a variable for that PROC step, as well as for any output data sets that the procedure creates that contain formatted variables. For more information on using formats in SAS procedures, see Base SAS Procedures Guide .
Informats
Definition of an Informat
An informat is an instruction that SAS uses to read data values into a variable. For example, the following value contains a dollar sign and commas:
,000,000
To remove the dollar sign ($) and commas (,) before storing the numeric value 1000000 in a variable, read this value with the COMMA11. informat.
Unless you explicitly define a variable first, SAS uses the informat to determine whether the variable is numeric or character. SAS also uses the informat to determine the length of character variables.
Syntax of an Informat
SAS informats have the following form:
<$> informat < w >.< d >
Here is an explanation of the syntax:
$
indicates a character informat; its absence indicates a numeric informat.
informat
names the informat. The informat is a SAS informat or a user-defined informat that was previously defined with the INVALUE statement in PROC FORMAT. For more information on user-defined informats, see "The FORMAT Procedure" in Base SAS Procedures Guide .
w
specifies the informat width, which for most informats is the number of columns in the input data.
d
specifies an optional decimal scaling factor in the numeric informats. SAS divides the input data by 10 to the power of d .
Note | Even though SAS can read up to 31 decimal places when you specify some numeric informats, floating-point numbers with more than 12 decimal places might lose precision due to the limitations of the eight-byte floating point representation used by most computers. |
Informats always contain a period (.) as a part of the name. If you omit the w and the d values from the informat, SAS uses default values. If the data contains decimal points, SAS ignores the d value and reads the number of decimal places that are actually in the input data.
If the informat width is too narrow to read all the columns in the input data, you may get unexpected results. The problem frequently occurs with the date and time informats. You must adjust the width of the informat to include blanks or special characters between the day, month, year, or time. For more information about date and time values, see Chapter 8, "Dates, Times, and Intervals," on page 127.
When a problem occurs with an informat, SAS writes a note to the SAS log and assigns a missing value to the variable. Problems occur if you use an incompatible informat, such as a numeric informat to read character data, or if you specify the width of a date and time informat that causes SAS to read a special character in the last column.
Ways to Specify Informats
You can specify informats in the following ways:
-
in an INPUT statement
The INPUT statement with an informat after a variable name is the simplest way to read values into a variable. For example, the following INPUT statement uses two informats:
input @15 style . @21 price 5.2;
The $ w . character informat reads values into the variable STYLE. The w . d numeric informat reads values into the variable PRICE.
For a complete discussion of the "INPUT Statement", see SAS Language Reference: Dictionary .
-
with the INPUT, INPUTC, and INPUTN functions
The INPUT function reads a SAS character expression using a specified informat. The informat determines whether the resulting value is numeric or character. Thus, the INPUT function is useful for converting data. For example,
TempCharacter='98.6'; TemperatureNumber=input(TempCharacter,4.);
Here, the INPUT function in combination with the w . d informat reads the character value of TempCharacter as a numeric value and assigns the numeric value 98.6 to TemperatureNumber.
Use the PUT function with a SAS format to convert numeric values to character values. See the "PUT Function" in SAS Language Reference: Dictionary for an example of a numeric-to-character conversion. For a complete discussion of the "INPUT Function", see SAS Language Reference: Dictionary .
-
in an INFORMAT statement in a DATA or a PROC step
The INFORMAT statement associates an informat with a variable. SAS uses the informat in any subsequent INPUT statement to read values into the variable. For example, in the following statements the INFORMAT statement associates the DATE w . informat with the variables Birthdate and Interview:
informat Birthdate Interview date9.; input @63 Birthdate Interview;
An informat that is associated with an INFORMAT statement behaves like an informat that you specify with a colon (:) format modifier in an INPUT statement. (For details about using the colon (:) modifier, see the "INPUT Statement, List" in SAS Language Reference: Dictionary .) Therefore, SAS uses a modified list input to read the variable so that
-
the w value in an informat does not determine column positions or input field widths in an external file
-
the blanks that are embedded in input data are treated as delimiters unless you change the DELIMITER = option in an INFILE statement
-
for character informats, the w value in an informat specifies the length of character variables
-
for numeric informats, the w value is ignored
-
for numeric informats, the d value in an informat behaves in the usual way for numeric informats
If you have coded the INPUT statement to use another style of input, such as formatted input or column input, that style of input is not used when you use the INFORMAT statement.
See the "INPUT Statement, List" in SAS Language Reference: Dictionary for more information about how to use modified list input to read data.
-
-
in an ATTRIB statement in a DATA or a PROC step.
The ATTRIB statement can also associate an informat, as well as other attributes, with one or more variables. For example, in the following statements, the ATTRIB statement associates the DATE w . informat with the variables Birthdate and Interview:
attrib Birthdate Interview informat=date9.; input @63 Birthdate Interview;
An informat that is associated by using the INFORMAT= option in the ATTRIB statement behaves like an informat that you specify with a colon (:) format modifier in an INPUT statement. (For details about using the colon (:) modifier, see the "INPUT Statement, List" in SAS Language Reference: Dictionary .) Therefore, SAS uses a modified list input to read the variable in the same way as it does for the INFORMAT statement.
See the "ATTRIB Statement" in SAS Language Reference: Dictionary for more information.
Permanent versus Temporary Association
When you specify an informat in an INPUT statement, SAS uses the informat to read input data values during that DATA step. SAS, however, does not permanently associate the informat with the variable. To permanently associate a format with a variable, use an INFORMAT statement or an ATTRIB statement. SAS permanently associates an informat with the variable by modifying the descriptor information in the SAS data set.
User-Defined Formats and Informats
In addition to the formats and informats that are supplied with Base SAS software, you can create your own formats and informats. In Base SAS software, PROC FORMAT allows you to create your own formats and informats for both character and numeric variables.
When you execute a SAS program that uses user-defined formats or informats, these formats and informats should be available. The two ways to make these formats and informats available are
-
to create permanent, not temporary, formats or informats with PROC FORMAT
-
to store the source code that creates the formats or informats (the PROC FORMAT step) with the SAS program that uses them.
To create permanent SAS formats and informats, see "The FORMAT Procedure" in Base SAS Procedures Guide .
If you execute a program that cannot locate a user-defined format or informat, the result depends on the setting of the FMTERR system option. If the user-defined format or informat is not found, then these system options produce these results:
System option | Results |
---|---|
FMTERR | SAS produces an error that causes the current DATA or PROC step to stop. |
NOFMTERR | SAS continues processing and substitutes a default format, usually the BEST w . or $ w. format. |
Although using NOFMTERR enables SAS to process a variable, you lose the information that the user-defined format or informat supplies .
To avoid problems, make sure that your program has access to all of the user-defined formats and informats that are used in the program.
Byte Ordering for Integer Binary Data on Big Endian and Little Endian Platforms
Definitions
Integer values for binary integer data are typically stored in one of three sizes: one-byte, two-byte, or four-byte. The ordering of the bytes for the integer varies depending on the platform (operating environment) on which the integers were produced.
The ordering of bytes differs between the "big endian" and "little endian" platforms. These colloquial terms are used to describe byte ordering for IBM mainframes (big endian) and for Intel-based platforms (little endian). In SAS, the following platforms are considered big endian: AIX, HP-UX, IBM mainframe, Macintosh, and Solaris. The following platforms are considered little endian: OpenVMS Alpha, Digital UNIX, Intel ABI, and Windows.
How Bytes Are Ordered
On big endian platforms, the value 1 is stored in binary and is represented here in hexadecimal notation. One byte is stored as 01, two bytes as 00 01, and four bytes as 00 00 00 01. On little endian platforms, the value 1 is stored in one byte as 01 (the same as big endian), in two bytes as 01 00, and in four bytes as 01 00 00 00.
If an integer is negative, the "two's complement" representation is used. The high-order bit of the most significant byte of the integer will be set on. For example, ˆ’ 2 would be represented in one, two, and four bytes on big endian platforms as FE, FF FE, and FF FF FF FE respectively. On little endian platforms, the representation would be FE, FE FF, and FE FF FF FF. These representations result from the output of the integer binary value ˆ’ 2 expressed in hexadecimal representation.
Writing Data Generated on Big Endian or Little Endian Platforms
SAS can read signed and unsigned integers regardless of whether they were generated on a big endian or a little endian system. Likewise, SAS can write signed and unsigned integers in both big endian and little endian format. The length of these integers can be up to eight bytes.
The following table shows which format to use for various combinations of platforms. In the Sign? column, "no" indicates that the number is unsigned and cannot be negative. "Yes" indicates that the number can be either negative or positive.
Data created for | Data written by | Sign? | Format/Informat |
---|---|---|---|
big endian | big endian | yes | IB or S370FIB |
big endian | big endian | no | PIB, S370FPIB, S370FIBU |
big endian | little endian | yes | S370FIB |
big endian | little endian | no | S370FPIB |
little endian | big endian | yes | IBR |
little endian | big endian | no | PIBR |
little endian | little endian | yes | IB or IBR |
little endian | little endian | no | PIB or PIBR |
big endian | either | yes | S370FIB |
big endian | either | no | S370FPIB |
little endian | either | yes | IBR |
little endian | either | no | PIBR |
Integer Binary Notation and Different Programming Languages
The following table compares integer binary notation according to programming language.
Language | 2 Bytes | 4 Bytes |
---|---|---|
SAS | IB2., IBR2., PIB2., PIBR2., S370FIB2., S370FIBU2., S370FPIB2. | IB4., IBR4., PIB4., PIBR4., S370FIB4., S370FIBU4., S370FPIB4. |
PL/I | FIXED BIN(15) | FIXED BIN(31) |
FORTRAN | INTEGER*2 | INTEGER*4 |
COBOL | COMP PIC 9(4) | COMP PIC 9(8) |
IBM assembler | H | F |
C | short | long |
Working with Packed Decimal and Zoned Decimal Data
Definitions
Packed decimal | specifies a method of encoding decimal numbers by using each byte to represent two decimal digits. Packed decimal representation stores decimal data with exact precision. The fractional part of the number is determined by the informat or format because there is no separate mantissa and exponent. |
An advantage of using packed decimal data is that exact precision can be maintained . However, computations involving decimal data might become inexact due to the lack of native instructions. | |
Zoned decimal | specifies a method of encoding decimal numbers in which each digit requires one byte of storage. The last byte contains the number's sign as well as the last digit. Zoned decimal data produces a printable representation. |
Nibble | specifies 1/2 of a byte. |
Packed Decimal Data
A packed decimal representation stores decimal digits in each "nibble" of a byte. Each byte has two nibbles, and each nibble is indicated by a hexadecimal digit. For example, the value 15 is stored in two nibbles , using the hexadecimal digits 1 and 5.
The sign indication is dependent on your operating environment. On IBM mainframes, the sign is indicated by the last nibble. With formats, C indicates a positive value, and D indicates a negative value. With informats, A, C, E, and F indicate positive values, and B and D indicate negative values. Any other nibble is invalid for signed packed decimal data. In all other operating environments, the sign is indicated in its own byte. If the high-order bit is 1, then the number is negative. Otherwise, it is positive.
The following applies to packed decimal data representation:
-
You can use the S370FPD format on all platforms to obtain the IBM mainframe configuration.
-
You can have unsigned packed data with no sign indicator. The packed decimal format and informat handles the representation. It is consistent between ASCII and EBCDIC platforms.
-
Note that the S370FPDU format and informat expects to have an F in the last nibble, while packed decimal expects no sign nibble.
Zoned Decimal Data
The following applies to zoned decimal data representation:
-
A zoned decimal representation stores a decimal digit in the low order nibble of each byte. For all but the byte containing the sign, the high-order nibble is the numeric zone nibble (F on EBCDIC and 3 on ASCII).
-
The sign can be merged into a byte with a digit, or it can be separate, depending on the representation. But the standard zoned decimal format and informat expects the sign to be merged into the last byte.
-
The EBCDIC and ASCII zoned decimal formats produce the same printable representation of numbers. There are two nibbles per byte, each indicated by a hexadecimal digit. For example, the value 15 is stored in two bytes. The first byte contains the hexadecimal value F1 and the second byte contains the hexadecimal value C5.
Packed Julian Dates
The following applies to packed Julian dates:
-
The two formats and informats that handle Julian dates in packed decimal representation are PDJULI and PDJULG. PDJULI uses the IBM mainframe year computation, while PDJULG uses the Gregorian computation.
-
The IBM mainframe computation considers 1900 to be the base year, and the year values in the data indicate the offset from 1900. For example, 98 means 1998, 100 means 2000, and 102 means 2002. 1998 would mean 3898.
-
The Gregorian computation allows for 2-digit or 4-digit years. If you use 2-digit years , SAS uses the setting of the YEARCUTOFF= system option to determine the true year.
Platforms Supporting Packed Decimal and Zoned Decimal Data
Some platforms have native instructions to support packed and zoned decimal data, while others must use software to emulate the computations. For example, the IBM mainframe has an Add Pack instruction to add packed decimal data, but the Intel-based platforms have no such instruction and must convert the decimal data into some other format.
Languages Supporting Packed Decimal and Zoned Decimal Data
Several different languages support packed decimal and zoned decimal data. The following table shows how COBOL picture clauses correspond to SAS formats and informats.
IBM VS COBOL II clauses | Corresponding S370Fxxx formats/informats |
---|---|
PIC S9(X) PACKED-DECIMAL | S370FPDw. |
PIC 9(X) PACKED-DECIMAL | S370FPDUw. |
PIC S9(W) DISPLAY | S370FZDw. |
PIC 9(W) DISPLAY | S370FZDUw. |
PIC S9(W) DISPLAY SIGN LEADING | S370FZDLw. |
PIC S9(W) DISPLAY SIGN LEADING SEPARATE | S370FZDSw. |
PIC S9(W) DISPLAY SIGN TRAILING SEPARATE | S370FZDTw. |
For the packed decimal representation listed above, X indicates the number of digits represented, and W is the number of bytes. For PIC S9(X) PACKED-DECIMAL, W is ceil((x+1)/2) . For PIC 9(X) PACKED-DECIMAL, W is ceil (x/2) . For example, PIC S9(5) PACKED-DECIMAL represents five digits. If a sign is included, six nibbles are needed. ceil((5+1)/2) has a length of three bytes, and the value of W is 3.
Note that you can substitute COMP-3 for PACKED-DECIMAL.
In IBM assembly language, the P directive indicates packed decimal, and the Z directive indicates zoned decimal. The following shows an excerpt from an assembly language listing, showing the offset, the value, and the DC statement:
offset value (in hex) inst label directive +000000 00001C 2 PEX1 DC PL3'1' +000003 00001D 3 PEX2 DC PL3'-1' +000006 F0F0C1 4 ZEX1 DC ZL3'1' +000009 F0F0D1 5 ZEX2 DC ZL3'1'
In PL/I, the FIXED DECIMAL attribute is used in conjunction with packed decimal data. You must use the PICTURE specification to represent zoned decimal data. There is no standardized representation of decimal data for the FORTRAN or the C languages.
Summary of Packed Decimal and Zoned Decimal Formats and Informats
SAS uses a group of formats and informats to handle packed and zoned decimal data. The following table lists the type of data representation for these formats and informats. Note that the formats and informats that begin with S370 refer to IBM mainframe representation.
Format | Type of data representation | Corresponding informat | Comments |
---|---|---|---|
PD | Packed decimal | PD | Local signed packed decimal |
PK | Packed decimal | PK | Unsigned packed decimal; not specific to your operating environment |
ZD | Zoned decimal | ZD | Local zoned decimal |
none | Zoned decimal | ZDB | Translates EBCDIC blank (hex 40) to EBCDIC zero (hex F0), then corresponds to the informat as zoned decimal |
none | Zoned decimal | ZDV | Non-IBM zoned decimal representation |
S370FPD | Packed decimal | S370FPD | Last nibble C (positive) or D (negative) |
S370FPDU | Packed decimal | S370FPDU | Last nibble always F (positive) |
S370FZD | Zoned decimal | S370FZD | Last byte contains sign in upper nibble: C (positive) or D (negative) |
S370FZDU | Zoned decimal | S370FZDU | Unsigned; sign nibble always F |
S370FZDL | Zoned decimal | S370FZDL | Sign nibble in first byte in informat; separate leading sign byte of hex C0 (positive) or D0 (negative) in format |
S370FZDS | Zoned decimal | S370FZDS | Leading sign of - (hex 60) or + (hex 4E) |
S370FZDT | Zoned decimal | S370FZDT | Trailing sign of - (hex 60) or + (hex 4E) |
PDJULI | Packed decimal | PDJULI | Julian date in packed representation - IBM computation |
PDJULG | Packed decimal | PDJULG | Julian date in packed representation - Gregorian computation |
none | Packed decimal | RMFDUR | Input layout is: mmssttt F |
none | Packed decimal | SHRSTAMP | Input layout is: yyyyddd F hhmmssth , where yyyyddd F is the packed Julian date; yyyy is a 0-based year from 1900 |
none | Packed decimal | SMFSTAMP | Input layout is: xxxxxxxxyyyyddd F, where yyyyddd F is the packed Julian date; yyyy is a 0-based year from 1900 |
none | Packed decimal | PDTIME | Input layout is: 0 hhmmss F |
none | Packed decimal | RMFSTAMP | Input layout is: 0 hhmmss F yyyyddd F, where yyyyddd F is the packed Julian date; yyyy is a 0-based year from 1900 |
Data Conversions and Encodings
An encoding maps each character in a character set to a unique numeric representation, resulting in a table of all code points. A single character can have different numeric representations in different encodings. For example, the ASCII encoding for the dollar symbol $ is 24hex. The Danish EBCDIC encoding for the dollar symbol $ is 67hex. In order for a version of SAS that normally uses ASCII to properly interpret a data set that is encoded in Danish EBCDIC, the data must be transcoded.
Transcoding is the process of moving data from one encoding to another. When transcoding the ASCII dollar sign to the Danish EBCDIC dollar sign, the hex representation for the character is converted from the value 24 to a 67.
If you want to know what the encoding of a particular SAS data set is, for SAS 9 and above follow these steps:
-
Locate the data set with SAS Explorer.
-
Right-click the data set.
-
Select Properties from the menu.
-
Click the Details tab.
-
The encoding of the data set is listed, along with other information.
Some situations where data might commonly be transcoded are:
-
when you share data between two different SAS sessions that are running in different locales or in different operating environments,
-
when you perform text-string operations, such as converting to uppercase or lowercase,
-
when you display or print characters from another language,
-
when you copy and paste data between SAS sessions running in different locales.
For more information on SAS features designed to handle data conversions from different encodings or operating environments, see SAS National Language Support (NLS): User's Guide .