SAS.STAT 9.1 Users Guide (Vol. 6)

The following statements are available in PROC SURVEYSELECT:

The PROC SURVEYSELECT statement invokes the procedure and optionally identifies input and output data sets. It also specifies the selection method, the sample size, and other sample design parameters. The SURVEYSELECT statement is required.

The SIZE statement identifies the variable that contains the size measures. It is required for any selection method that is probability proportional to size (PPS).

The remaining statements are optional. The STRATA statement identifies a variable or set of variables that stratify the input data set. When you specify a STRATA statement, PROC SURVEYSELECT selects samples independently from the strata formed by the STRATA variables. The CONTROL statement identifies variables for ordering units within strata. It can be used for systematic and sequential sampling methods . The ID statement identifies variables to copy from the input data set to the output data set of selected units.

The rest of this section gives detailed syntax information for the CONTROL, ID, SIZE, and STRATA statements in alphabetical order after the description of the PROC SURVEYSELECT statement.

PROC SURVEYSELECT Statement

The PROC SURVEYSELECT statement invokes the procedure and optionally identifies input and output data sets. If you do not name a DATA= input data set, the procedure selects the sample from the most recently created SAS data set. If you do not name an OUT= output data set to contain the sample of selected units, the procedure still creates an output data set and names it according to the DATA n convention.

The PROC SURVEYSELECT statement also specifies the sample selection method, the sample size, and other sample design parameters. If you do not specify a selection method, PROC SURVEYSELECT uses simple random sampling (METHOD=SRS) if there is no SIZE statement. If you specify a SIZE statement but do not specify a selection method, PROC SURVEYSELECT uses probability proportional to size selection without replacement (METHOD=PPS). You must specify the sample size or sampling rate unless you request a method that selects two units from each stratum (METHOD=PPS_BREWER or METHOD=PPS_MURTHY).

You can use the SAMPSIZE= n option to specify the sample size, or you can use the SAMPSIZE= SAS-data-set option to name a secondary input data set that contains stratum sample sizes. You can also specify stratum sampling rates, minimum size measures, maximum size measures, and certainty size measures in the secondary input data set. See the descriptions of the SAMPSIZE=, SAMPRATE=, MINSIZE=, MAXSIZE=, and CERTSIZE= options. You can name only one secondary input data set in each invocation of the procedure.

The following table lists the options available with the PROC SURVEYSELECT statement. Descriptions follow in alphabetical order.

Table 72.1: PROC SURVEYSELECT Statement Options

Task

Options

Specify the input data set

DATA=

Specify output data sets

OUT=

OUTSORT=

Suppress displayed output

NOPRINT

Specify selection method

METHOD=

Specify sample size

SAMPSIZE=

SELECTALL

Specify sampling rate

SAMPRATE=

NMIN=

NMAX=

Specify number of replicates

REP=

Adjust size measures

MINSIZE=

MAXSIZE=

Specify certainty size measures

CERTSIZE=

Specify sorting type

SORT =

Specify random number seed

SEED=

Control OUT= contents

JTPROBS

OUTALL

OUTHITS

OUTSEED

OUTSIZE

STATS

You can specify the following options in the PROC SURVEYSELECT statement:

CERTSIZE

CERTSIZE= certain

CERTSIZE= SAS-data-set

CERTSIZE=P= p

DATA= SAS-data-set

JTPROBS

MAXSIZE

MAXSIZE= max

MAXSIZE= SAS-data-set

METHOD= name

M= name

MINSIZE

MINSIZE= min

MINSIZE= SAS-data-set

NMAX= n

NMIN= n

NOPRINT

OUT= SAS-data-set

OUTALL

OUTHITS

OUTSEED

OUTSIZE

OUTSORT= SAS-data-set

REP= nrep

SAMPRATE= r

RATE= r

SAMPRATE=( values )

RATE=( values )

SAMPRATE= SAS-data-set

RATE= SAS-data-set

SAMPSIZE= n

N= n

SAMPSIZE=( values )

N=( values )

SAMPSIZE= SAS-data-set

N= SAS-data-set

SEED= number

SEED= SAS-data-set

SELECTALL

SORT=NEST SERP

STATS

CONTROL Statement

The CONTROL statement names variables for sorting the input data set. The CONTROL variables can be character or numeric.

PROC SURVEYSELECT sorts the input data set by the CONTROL variables before selecting the sample. If you also specify a STRATA statement, PROC SURVEYSELECT sorts by CONTROL variables within strata. Control sorting is available for systematic and sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ,andMETHOD=PPS_SEQ).

By default, PROC SURVEYSELECT uses hierarchic serpentine sorting by the CONTROL variables. If you specify the SORT=NEST option, the procedure uses nested sorting. See the description for the SORT= option. For more information on serpentine and nested sorting, see the section 'Sorting by CONTROL Variables' on page 4445.

You can use the OUTSORT= option to name an output data set that contains the sorted input data set. If you do not specify the OUTSORT= option when you use the CONTROL statement, then the sorted data set replaces the input data set.

ID Statement

The ID statement names variables from the DATA= input data set to be included in the OUT= data set of selected units. If there is no ID statement, PROC SURVEYSELECT includes all variables from the DATA= data set in the OUT= data set. The ID variables can be character or numeric.

SIZE Statement

The SIZE statement names one and only one size measure variable, which contains the size measures to be used when sampling with probability proportional to size. The SIZE variable must be numeric. When the value of an observation's SIZE variable is missing or nonpositive, that observation has no chance of being selected for the sample.

The SIZE statement is required for all PPS selection methods, which include METHOD=PPS, METHOD=PPS_BREWER, METHOD=PPS_MURTHY, METHOD=PPS_SAMPFORD, METHOD=PPS_SEQ, METHOD=PPS_SYS, and METHOD=PPS_WR. For details on how size measures are used, see the descriptions of PPS methods in the section 'Sample Selection Methods' on page 4446.

Note that a unit's size measure, specified in the SIZE statement and used for PPS selection, is not the same as the sample size. The sample size is the number of units selected for the sample, and you can specify this with the SAMPSIZE= option.

STRATA Statement

You can specify a STRATA statement with PROC SURVEYSELECT to partition the input data set into nonoverlapping groups defined by the STRATA variables. PROC SURVEYSELECT then selects independent samples from these strata, according to the selection method and design parameters specified in the PROC SURVEYSELECT statement. For information on the use of stratification in sample design, refer to Lohr (1999), Kalton (1983), Kish (1965, 1987), and Cochran (1977).

The variables are one or more variables in the input data set. The STRATA variables function much like BY variables, and PROC SURVEYSELECT expects the input data set to be sorted in order of the STRATA variables.

If you specify a CONTROL statement, or if you specify METHOD=PPS, the input data set must be sorted in ascending order of the STRATA variables. This means you cannot use the STRATA option NOTSORTED or DESCENDING when you specify a CONTROL statement or METHOD=PPS.

If your input data set is not sorted by the STRATA variables in ascending order, use one of the following alternatives:

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

Категории