Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4

This section describes the components that are used in SQL procedure statements. Components are the items in PROC SQL syntax that appear in roman type.

Most components are contained in clauses within the statements. For example, the basic SELECT statement is composed of the SELECT and FROM clauses, where each clause contains one or more components. Components can also contain other components.

For easy reference, components appear in alphabetical order, and some terms are referred to before they are defined. Use the index or the See Also references to refer to other statement or component descriptions that may be helpful.

BETWEEN condition

Selects rows where column values are within a range of values.

sql-expression <NOT> BETWEEN sql-expression

Argument

sql-expression

Details

BTRIM function

Removes blanks or specified characters from the beginning, the end, or both the beginning and end of a character string.

BTRIM (<< btrim-specification >< btrim-character FROM>> sql-expression)

Arguments

btrim-specification

btrim-character

sql-expression

Details

The BTRIM function operates on character strings. BTRIM removes one or more instances of a single character (the value of btrim-character ) from the beginning, the end, or both the beginning and end of a string, depending whether LEADING, TRAILING, or BOTH is specified. If btrim-specification is not specified, then BOTH is used. If btrim-character is omitted, then blanks are removed.

Note: SAS adds trailing blanks to character values that are shorter than the length of the variable. Suppose you have a character variable Z, with length 10, and a value xxabcxx . SAS stores the value with three blanks after the last x (for a total length of 10). If you attempt to remove all the x characters with

btrim(both 'x' from z)

then the result is abcxx because PROC SQL sees the trailing characters as blanks, not the x character. In order to remove all the x characters, use

btrim(both 'x' from btrim(z))

The inner BTRIM function removes the trailing blanks before passing the value to the outer BTRIM function.

CALCULATED

Refers to columns already calculated in the SELECT clause.

CALCULATED column-alias

Argument

column-alias

Referencing a CALCULATED Column

CALCULATED enables you to use the results of an expression in the same SELECT clause or in the WHERE clause. It is valid only when used to refer to columns that are calculated in the immediate query expression.

CASE expression

Selects result values that satisfy specified conditions.

Featured in: Example 3 on page 1143 and Example 13 on page 1168

CASE < case-operand >

Arguments

case-operand

when-condition

result-expression

Details

The CASE expression selects values if certain conditions are met. A CASE expression returns a single value that is conditionally evaluated for each row of a table (or view). Use the WHEN-THEN clauses when you want to execute a CASE expression for some but not all of the rows in the table that is being queried or created. An optional ELSE expression gives an alternative action if no THEN expression is executed.

When you omit case-operand , when-condition is evaluated as a Boolean (true or false) value. If when-condition returns a nonzero, nonmissing result, then the WHEN clause is true. If case-operand is specified, then it is compared with when-condition for equality. If case-operand equals when-condition , then the WHEN clause is true.

If the when-condition is true for the row that is being executed, then the result-expression that follows THEN is executed. If when-condition is false, then PROC SQL evaluates the next when-condition until they are all evaluated. If every when-condition is false, then PROC SQL executes the ELSE expression, and its result becomes the CASE expression s result. If no ELSE expression is present and every when-condition is false, then the result of the CASE expression is a missing value.

You can use a CASE expression as an item in the SELECT clause and as either operand in an sql-expression.

Example

The following two PROC SQL steps show two equivalent CASE expressions that create a character column with the strings in the THEN clause. The CASE expression in the second PROC SQL step is a shorthand method that is useful when all the comparisons are with the same column.

proc sql; select Name, case when Continent = 'North America' then 'Continental U.S.' when Continent = 'Oceania' then 'Pacific Islands' else 'None' end as Region from states; proc sql; select Name, case Continent when 'North America' then 'Continental U.S.' when 'Oceania' then 'Pacific Islands' else 'None' end as Region from states;

Note: When you use the shorthand method, the conditions must all be equality tests. That is, they cannot use comparison operators or other types of operators.

COALESCE Function

Returns the first nonmissing value from a list of columns.

Featured in: Example 7 on page 1152

COALESCE (column-name <, column-name>)

Arguments

column-name

Details

COALESCE accepts one or more column names of the same data type. The COALESCE function checks the value of each column in the order in which they are listed and returns the first nonmissing value. If only one column is listed, the COALESCE function returns the value of that column. If all the values of all arguments are missing, the COALESCE function returns a missing value.

In some SQL DBMSs, the COALESCE function is called the IFNULL function. See PROC SQL and the ANSI Standard on page 1136 for more information.

Note: If your query contains a large number of COALESCE function calls, it might be more efficient to use a natural join instead. See Natural Joins on page 1102.

column-definition

Defines PROC SQL s data types and dates.

See also: column-modifier on page 1090

Featured in: Example 1 on page 1139

column data-type <column-modifier < column-modifier>>

Arguments

column

column-modifier

data-type

Details

column-modifier

Sets column attributes.

See also: column-definition on page 1089 and SELECT Clause on page 1072

Featured in: Example 1 on page 1139 and Example 2 on page 1141

column-modifier

Arguments

column-modifier

Details

If you refer to a labeled column in the ORDER BY or GROUP BY clause, then you must use either the column name (not its label), the column s alias, or its ordering integer (for example, ORDER BY 2 ). See the section on SAS statements in SAS Language Reference: Dictionary for more information about labels.

column-name

Specifies the column to select.

See also: column-modifier on page 1090 and SELECT Clause on page 1072

column-name

column-name

Details

A column can be referred to by its name alone if it is the only column by that name in all the tables or views listed in the current query-expression. If the same column name exists in more than one table or view in the query-expression, then you must qualify each use of the column name by prefixing a reference to the table that contains it. Consider the following examples:

SALARY /* name of the column */ EMP.SALARY /* EMP is the table or view name */ E.SALARY /* E is an alias for the table or view that contains the SALARY column */

CONNECTION TO

Retrieves and uses DBMS data in a PROC SQL query or view.

Tip: You can use CONNECTION TO in the SELECT statement s FROM clause as part of the from-list.

See also: Connecting to a DBMS Using the SQL Procedure Pass-Through Facility on page 1129 and your SAS/ACCESS documentation.

CONNECTION TO dbms-name (dbms-query)

CONNECTION TO alias (dbms-query)

Arguments

alias

dbms-name

dbms-query

CONTAINS condition

Tests whether a string is part of a column s value.

Alias: ?

Restriction: The CONTAINS condition is used only with character operands.

Featured in: Example 7 on page 1152

sql-expression <NOT> CONTAINS sql-expression

Argument

sql-expression

EXISTS condition

Tests if a subquery returns one or more rows.

See also: Query Expressions (Subqueries) on page 1116

<NOT> EXISTS (query-expression)

Argument

query-expression

Details

The EXISTS condition is an operator whose right operand is a subquery. The result of an EXISTS condition is true if the subquery resolves to at least one row. The result of a NOT EXISTS condition is true if the subquery evaluates to zero rows. For example, the following query subsets PROCLIB.PAYROLL (which is shown in Example 2 on page 1141) based on the criteria in the subquery. If the value for STAFF.IDNUM is on the same row as the value CT in PROCLIB.STAFF (which is shown in Example 4 on page 1145), then the matching IDNUM in PROCLIB.PAYROLL is included in the output. Thus, the query returns all the employees from PROCLIB.PAYROLL who live in CT .

proc sql; select * from proclib.payroll p where exists (select * from proclib.staff s where p.idnumber=s.idnum and state='CT');

IN condition

Tests set membership.

Featured in: Example 4 on page 1145

sql-expression <NOT> IN (query-expression constant <, constant >)

Arguments

constant

query-expression

sql-expression

Details

An IN condition tests if the column value that is returned by the sql-expression on the left is a member of the set (of constants or values returned by the query-expression) on the right. The IN condition is true if the value of the left-hand operand is in the set of values that are defined by the right-hand operand.

IS condition

Tests for a missing value.

Featured in: Example 5 on page 1148

sql-expression IS <NOT> NULL MISSING

Argument

sql-expression

Details

IS NULL and IS MISSING are predicates that test for a missing value. IS NULL and IS MISSING are used in the WHERE, ON, and HAVING expressions. Each predicate resolves to true if the sql-expression s result is missing and false if it is not missing.

SAS stores a numeric missing value as a period (.) and a character missing value as a blank space. Unlike missing values in some versions of SQL, missing values in SAS always appear first in the collating sequence. Therefore, in Boolean and comparison operations, the following expressions resolve to true in a predicate:

3>null -3>null 0>null

The SAS way of evaluating missing values differs from that of the ANSI Standard for SQL. According to the Standard, these expressions are NULL. See sql-expression on page 1113 for more information on predicates and operators. See PROC SQL and the ANSI Standard on page 1136 for more information on the ANSI Standard.

joined-table

Joins a table with itself or with other tables or views.

Restrictions: Joins are limited to 32 tables.

See also: FROM Clause on page 1077 and query-expression on page 1107

Featured in: Example 4 on page 1145, Example 7 on page 1152, Example 9 on page 1159, Example 13 on page 1168, and Example 14 on page 1172

  1. table-name << AS > alias >, table-name << AS > alias > <, table-name << AS > alias >>

  2. <(> table-name < INNER > JOIN table-name ON sql-expression<)>

  3. <(> table-name LEFT JOIN RIGHT JOIN FULL JOIN table-name ON sql-expression<)>

  4. <(> table-name CROSS JOIN table-name <)>

  5. <(> table-name UNION JOIN table-name <)>

  6. <(> table-name NATURAL < INNER FULL < OUTER > LEFT < OUTER > RIGHT < OUTER >> JOIN table-name <)>

Arguments

alias

sql-expression

table-name

Note: If you include parentheses, then be sure to include them in pairs. Parentheses are not valid around comma joins (type 1).

Types of Joins

  1. 2 Inner join. See Inner Joins on page 1097.

  1. Outer join. See Outer Joins on page 1099.

  2. Cross join. See Cross Joins on page 1101.

  3. Union join. See Union Joins on page 1102.

  4. Natural join. See Natural Joins on page 1102.

Joining Tables

When multiple tables, views, or query-expressions are listed in the FROM clause, they are processed to form one table. The resulting table contains data from each contributing table. These queries are referred to as joins .

Conceptually, when two tables are specified, each row of table A is matched with all the rows of table B to produce an internal or intermediate table. The number of rows in the intermediate table ( Cartesian product ) is equal to the product of the number of rows in each of the source tables. The intermediate table becomes the input to the rest of the query in which some of its rows may be eliminated by the WHERE clause or summarized by a summary function.

A common type of join is an equijoin , in which the values from a column in the first table must equal the values of a column in the second table.

Table Limit

PROC SQL can process a maximum of 32 tables for a join. If you are using views in a join, then the number of tables on which the views are based count toward the 32-table limit. Each CONNECTION TO component in the Pass-Through Facility counts as one table.

Specifying the Rows to Be Returned

The WHERE clause or ON clause contains the conditions (sql-expression) under which the rows in the Cartesian product are kept or eliminated in the result table. WHERE is used to select rows from inner joins. ON is used to select rows from inner or outer joins.

The expression is evaluated for each row from each table in the intermediate table described earlier in Joining Tables on page 1096. The row is considered to be matching if the result of the expression is true (a nonzero, nonmissing value) for that row.

Note: You can follow the ON clause with a WHERE clause to further subset the query result. See Example 7 on page 1152 for an example.

Table Aliases

Table aliases are used in joins to distinguish the columns of one table from those in the other table(s). A table name or alias must be prefixed to a column name when you are joining tables that have matching column names. See FROM Clause on page 1077 for more information on table aliases.

Joining a Table with Itself

A single table can be joined with itself to produce more information. These joins are sometimes called reflexive joins . In these joins, the same table is listed twice in the FROM clause. Each instance of the table must have a table alias or you will not be able to distinguish between references to columns in either instance of the table. See Example 13 on page 1168 and Example 14 on page 1172 for examples.

Inner Joins

An inner join returns a result table for all the rows in a table that have one or more matching rows in the other table(s), as specified by the sql-expression. Inner joins can be performed on up to 32 tables in the same query-expression.

You can perform an inner join by using a list of table-names separated by commas or by using the INNER, JOIN, and ON keywords.

The LEFTTAB and RIGHTTAB tables are used to illustrate this type of join:

Left Table - LEFTTAB Continent Export Country ----------------------------- NA wheat Canada EUR corn France EUR rice Italy AFR oil Egypt

Right Table - RIGHTTAB Continent Export Country ----------------------------- NA sugar USA EUR corn Spain EUR beets Belgium ASIA rice Vietnam

The following example joins the LEFTTAB and RIGHTTAB tables to get the Cartesian product of the two tables. The Cartesian product is the result of combining every row from one table with every row from another table. You get the Cartesian product when you join two tables and do not subset them with a WHERE clause or ON clause.

proc sql; title 'The Cartesian Product of'; title2 'LEFTTAB and RIGHTTAB'; select * from lefttab, righttab;

The Cartesian Product of LEFTTAB and RIGHTTAB Continent Export Country Continent Export Country ---------------------------------------------------------------- NA wheat Canada NA sugar USA NA wheat Canada EUR corn Spain NA wheat Canada EUR beets Belgium NA wheat Canada ASIA rice Vietnam EUR corn France NA sugar USA EUR corn France EUR corn Spain EUR corn France EUR beets Belgium EUR corn France ASIA rice Vietnam EUR rice Italy NA sugar USA EUR rice Italy EUR corn Spain EUR rice Italy EUR beets Belgium EUR rice Italy ASIA rice Vietnam AFR oil Egypt NA sugar USA AFR oil Egypt EUR corn Spain AFR oil Egypt EUR beets Belgium AFR oil Egypt ASIA rice Vietnam

The LEFTTAB and RIGHTTAB tables can be joined by listing the table names in the FROM clause. The following query represents an equijoin because the values of Continent from each table are matched. The column names are prefixed with the table aliases so that the correct columns can be selected.

proc sql; title 'Inner Join'; select * from lefttab as l, righttab as r where l.continent=r.continent;

Inner Join Continent Export Country Continent Export Country ---------------------------------------------------------------- NA wheat Canada NA sugar USA EUR corn France EUR corn Spain EUR corn France EUR beets Belgium EUR rice Italy EUR corn Spain EUR rice Italy EUR beets Belgium

The following PROC SQL step is equivalent to the previous one and shows how to write an equijoin using the INNER JOIN and ON keywords.

proc sql; title 'Inner Join'; select * from lefttab as l inner join righttab as r on l.continent=r.continent;

See Example 4 on page 1145, Example 13 on page 1168, and Example 14 on page 1172 for more examples.

Outer Joins

Outer joins are inner joins that have been augmented with rows that did not match with any row from the other table in the join. The three types of outer joins are left, right, and full.

A left outer join, specified with the keywords LEFT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the first (LEFTTAB) table that do not match any row in the second (RIGHTTAB) table.

proc sql; title 'Left Outer Join'; select * from lefttab as l left join righttab as r on l.continent=r.continent;

Left Outer Join Continent Export Country Continent Export Country ---------------------------------------------------------------- AFR oil Egypt EUR rice Italy EUR beets Belgium EUR corn France EUR beets Belgium EUR rice Italy EUR corn Spain EUR corn France EUR corn Spain NA wheat Canada NA sugar USA

A right outer join, specified with the keywords RIGHT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the second (RIGHTTAB) table that do not match any row in the first (LEFTTAB) table.

proc sql; title 'Right Outer Join'; select * from lefttab as l right join righttab as r on l.continent=r.continent;

Right Outer Join Continent Export Country Continent Export Country ---------------------------------------------------------------- ASIA rice Vietnam EUR rice Italy EUR beets Belgium EUR rice Italy EUR corn Spain EUR corn France EUR beets Belgium EUR corn France EUR corn Spain NA wheat Canada NA sugar USA

A full outer join, specified with the keywords FULL JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from each table that do not match any row in the other table.

proc sql; title 'Full Outer Join'; select * from lefttab as l full join righttab as r on l.continent=r.continent;

Full Outer Join Continent Export Country Continent Export Country ---------------------------------------------------------------- AFR oil Egypt ASIA rice Vietnam EUR rice Italy EUR beets Belgium EUR rice Italy EUR corn Spain EUR corn France EUR beets Belgium EUR corn France EUR corn Spain NA wheat Canada NA sugar USA

See Example 7 on page 1152 for another example.

Cross Joins

A cross join returns as its result table the product of the two tables.

Using the LEFTTAB and RIGHTTAB example tables, the following program demonstrates the cross join:

proc sql; title 'Cross Join'; select * from lefttab as l cross join righttab as r;

Cross Join Continent Export Country Continent Export Country ---------------------------------------------------------------- NA wheat Canada NA sugar USA NA wheat Canada EUR corn Spain NA wheat Canada EUR beets Belgium NA wheat Canada ASIA rice Vietnam EUR corn France NA sugar USA EUR corn France EUR corn Spain EUR corn France EUR beets Belgium EUR corn France ASIA rice Vietnam EUR rice Italy NA sugar USA EUR rice Italy EUR corn Spain EUR rice Italy EUR beets Belgium EUR rice Italy ASIA rice Vietnam AFR oil Egypt NA sugar USA AFR oil Egypt EUR corn Spain AFR oil Egypt EUR beets Belgium AFR oil Egypt ASIA rice Vietnam

The cross join is not functionally different from a Cartesian product join. You would get the same result by submitting the following program:

proc sql; select * from lefttab, righttab;

Do not use an ON clause with a cross join. An ON clause will cause a cross join to fail. However, you can use a WHERE clause to subset the output.

Union Joins

A union join returns a union of the columns of both tables. The union join places in the results all rows with their respective column values from each input table. Columns that do not exist in one table will have null (missing) values for those rows in the result table. The following example demonstrates a union join.

proc sql; title 'Union Join'; select * from lefttab union join righttab;

Union Join Continent Export Country Continent Export Country ------------------------------------------------------------- NA sugar USA EUR corn Spain EUR beets Belgium ASIA rice Vietnam NA wheat Canada EUR corn France EUR rice Italy AFR oil Egypt

Using a union join is similar to concatenating tables with the OUTER UNION set operator. See query-expression on page 1107 for more information.

Do not use an ON clause with a union join. An ON clause will cause a union join to fail.

Natural Joins

A natural join selects rows from two tables that have equal values in columns that share the same name and the same type. An error results if two columns have the same name but different types. If join-specification is omitted when specifying a natural join, then INNER is implied . If no like columns are found, then a cross join is performed.

The following examples use these two tables:

table1 x y z ----------------------------- 1 2 3 2 1 8 6 5 4 2 5 6

table2 x b z ----------------------------- 1 5 3 3 5 4 2 7 8 6 0 4

The following program demonstrates a natural inner join.

proc sql; title 'Natural Inner Join'; select * from table1 natural join table2;

Natural Inner Join x z b y --------------------------------------- 1 3 5 2 2 8 7 1 6 4 0 5

The following program demonstrates a natural left outer join.

proc sql; title 'Natural Left Outer Join'; select * from table1 natural left join table2;

Natural Left Outer Join x z b y --------------------------------------- 1 3 5 2 2 6 . 5 2 8 7 1 6 4 0 5

Do not use an ON clause with a natural join. An ON clause will cause a natural join to fail. When using a natural join, an ON clause is implied, matching all like columns.

Joining More Than Two Tables

Inner joins are usually performed on two or three tables, but they can be performed on up to 32 tables in PROC SQL. A join on three tables is described here to explain how and why the relationships work among the tables.

In a three-way join, the sql-expression consists of two conditions: one relates the first table to the second table and the other relates the second table to the third table. It is possible to break this example into stages, performing a two-way join into a temporary table and then joining that table with the third one for the same result. However, PROC SQL can do it all in one step as shown in the next example.

The example shows the joining of three tables: COMM, PRICE, and AMOUNT. To calculate the total revenue from exports for each country, you need to multiply the amount exported (AMOUNT table) by the price of each unit (PRICE table), and you must know the commodity that each country exports (COMM table).

COMM Table Continent Export Country ----------------------------- NA wheat Canada EUR corn France EUR rice Italy AFR oil Egypt

PRICE Table Export Price ------------------- rice 3.56 corn 3.45 oil 18 wheat 2.98

AMOUNT Table Country Quantity ------------------- Canada 16000 France 2400 Italy 500 Egypt 10000

proc sql; title 'Total Export Revenue'; select c.Country, p.Export, p.Price, a.Quantity,a.quantity*p.price as Total from comm c, price p, amount a where c.export=p.export and c.country=a.country;

Total Export Revenue Country Export Price Quantity Total -------------------------------------------------- Italy rice 3.56 500 1780 France corn 3.45 2400 8280 Egypt oil 18 10000 180000 Canada wheat 2.98 16000 47680

See Example 9 on page 1159 for another example.

Comparison of Joins and Subqueries

You can often use a subquery or a join to get the same result. However, it is often more efficient to use a join if the outer query and the subquery do not return duplicate rows. For example, the following queries produce the same result. The second query is more efficient:

proc sql; select IDNumber, Birth from proclib.payroll where IDNumber in (select idnum from proclib.staff where lname like 'B%'); proc sql; select p.IDNumber, p.Birth from proclib.payroll p, proclib.staff s where p.idnumber=s.idnum and s.lname like 'B%';

Note: PROCLIB.PAYROLL is shown in Example 2 on page 1141.

LIKE condition

Tests for a matching pattern.

sql-expression <NOT> LIKE sql-expression <ESCAPE character-expression >

Arguments

sql-expression

character-expression

Details

The LIKE condition selects rows by comparing character strings with a pattern-matching specification. It resolves to true and displays the matched string(s) if the left operand matches the pattern specified by the right operand.

The ESCAPE clause is used to search for literal instances of the percent (%) and underscore (_) characters, which are usually used for pattern matching.

Patterns for Searching

Patterns are composed of three classes of characters:

underscore (_)

percent sign (%)

any other character

These patterns can appear before, after, or on both sides of characters that you want to match. The LIKE condition is case-sensitive.

The following list uses these values: Smith , Smooth , Smothers , Smart , and Smuggle .

Searching for Literal % and _

Because the % and _ characters have special meaning in the context of the LIKE condition, you must use the ESCAPE clause to search for these character literals in the input character string.

These example use the values app , a_% , a__ , bbaa1 , and ba_1 .

Searching for Mixed-Case Strings

To search for mixed-case strings, use the UPCASE function to make all the names uppercase before entering the LIKE condition:

upcase(name) like 'SM%';

Note: When you are using the % character, be aware of the effect of trailing blanks. You may have to use the TRIM function to remove trailing blanks in order to match values.

LOWER function

Converts the case of a character string to lowercase.

See also: UPPER function on page 1128

LOWER (sql-expression)

Argument

sql-expression

Details

The LOWER function operates on character strings. LOWER changes the case of its argument to all lowercase.

Note: The LOWER function is provided for compatibility with the ANSI SQL standard. You can also use the SAS function LOWCASE.

query-expression

Retrieves data from tables.

See also: table-expression on page 1127, Query Expressions (Subqueries) on page 1116, and In-Line Views on page 1078

table-expression < set-operator table-expression> < set-operator table-expression>

Arguments

table-expression

set-operator

Query Expressions and Table Expressions

A query-expression is one or more table-expressions. Multiple table expressions are linked by set operators. The following figure illustrates the relationship between table-expressions and query-expressions.

Set Operators

PROC SQL provides these set operators:

OUTER UNION

UNION

EXCEPT

INTERSECT

A query-expression with set operators is evaluated as follows.

Set operators follow this order of precedence unless they are overridden by parentheses in the expression(s): INTERSECT is evaluated first. OUTER UNION, UNION, and EXCEPT have the same level of precedence.

PROC SQL performs set operations even if the tables or views that are referred to in the table-expressions do not have the same number of columns. The reason for this behavior is that the ANSI Standard for SQL requires that tables or views that are involved in a set operation have the same number of columns and that the columns have matching data types. If a set operation is performed on a table or view that has fewer columns than the one(s) with which it is being linked, then PROC SQL extends the table or view with fewer columns by creating columns with missing values of the appropriate data type. This temporary alteration enables the set operation to be performed correctly.

CORRESPONDING (CORR) Keyword

The CORRESPONDING keyword is used only when a set operator is specified. CORR causes PROC SQL to match the columns in table-expressions by name and not by ordinal position. Columns that do not match by name are excluded from the result table, except for the OUTER UNION operator. See OUTER UNION on page 1109.

For example, when performing a set operation on two table-expressions, PROC SQL matches the first specified column-name (listed in the SELECT clause) from one table-expression with the first specified column-name from the other. If CORR is omitted, then PROC SQL matches the columns by ordinal position.

ALL Keyword

The set operators automatically eliminate duplicate rows from their output tables. The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query-expression s performance. You use it when you want to display all the rows resulting from the table-expressions, rather than just the unique rows. The ALL keyword is used only when a set operator is also specified.

OUTER UNION

Performing an OUTER UNION is very similar to performing the SAS DATA step with a SET statement. The OUTER UNION concatenates the intermediate results from the table-expressions. Thus, the result table for the query-expression contains all the rows produced by the first table-expression followed by all the rows produced by the second table-expression. Columns with the same name are in separate columns in the result table.

For example, the following query expression concatenates the ME1 and ME2 tables but does not overlay like-named columns. Output 45.1 shows the result.

ME1 IDnum Jobcode Salary Bonus --------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986

ME2 IDnum Jobcode Salary ----------------------------- 1653 ME2 35108 1782 ME2 35345 1244 ME2 36925

proc sql; title 'ME1 and ME2: OUTER UNION'; select * from me1 outer union select * from me2;

Output 45.1: OUTER UNION of ME1 and ME2 Tables

ME1 and ME2: OUTER UNION IDnum Jobcode Salary Bonus IDnum Jobcode Salary --------------------------------------------------------------------- 1400 ME1 29769 587 . 1403 ME1 28072 342 . 1120 ME1 28619 986 . 1120 ME1 28619 986 . . . 1653 ME2 35108 . . 1782 ME2 35345 . . 1244 ME2 36925

 

Concatenating tables with the OUTER UNION set operator is similar to performing a union join. See Union Joins on page 1102 for more information.

To overlay columns with the same name, use the CORRESPONDING keyword.

proc sql; title 'ME1 and ME2: OUTER UNION CORRESPONDING'; select * from me1 outer union corr select * from me2;

ME1 and ME2: OUTER UNION CORRESPONDING IDnum Jobcode Salary Bonus -------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986 1653 ME2 35108 . 1782 ME2 35345 . 1244 ME2 36925 .

In the resulting concatenated table, notice the following:

UNION

The UNION operator produces a table that contains all the unique rows that result from both table-expressions. That is, the output table contains rows produced by the first table-expression, the second table-expression, or both.

Columns are appended by position in the tables, regardless of the column names. However, the data type of the corresponding columns must match or the union will not occur. PROC SQL issues a warning message and stops executing.

The names of the columns in the output table are the names of the columns from the first table-expression unless a column (such as an expression) has no name in the first table-expression. In such a case, the name of that column in the output table is the name of the respective column in the second table-expression.

In the following example, PROC SQL combines the two tables:

proc sql; title 'ME1 and ME2: UNION'; select * from me1 union select * from me2;

ME1 and ME2: UNION IDnum Jobcode Salary Bonus -------------------------------------- 1120 ME1 28619 986 1244 ME2 36925 . 1400 ME1 29769 587 1403 ME1 28072 342 1653 ME2 35108 . 1782 ME2 35345 .

In the following example, ALL includes the duplicate row from ME1. In addition, ALL changes the sorting by specifying that PROC SQL make one pass only. Thus, the values from ME2 are simply appended to the values from ME1.

proc sql; title 'ME1 and ME2: UNION ALL'; select * from me1 union all select * from me2;

ME1 and ME2: UNION ALL IDnum Jobcode Salary Bonus -------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986 1653 ME2 35108 . 1782 ME2 35345 . 1244 ME2 36925 .

See Example 5 on page 1148 for another example.

EXCEPT

The EXCEPT operator produces (from the first table-expression) an output table that has unique rows that are not in the second table-expression. If the intermediate result from the first table-expression has at least one occurrence of a row that is not in the intermediate result of the second table-expression, then that row (from the first table-expression) is included in the result table.

In the following example, the IN_USA table contains flights to cities within and outside the USA. The OUT_USA table contains flights only to cities outside the USA. This example returns only the rows from IN_USA that are not also in OUT_USA:

proc sql; title 'Flights from IN_USA Only'; select * from in_usa except select * from out_usa;

IN_USA Flight Dest ------------------ 145 ORD 156 WAS 188 LAX 193 FRA 207 LON

OUT_USA Flight Dest ------------------ 193 FRA 207 LON 311 SJA

Flights from IN_USA Only Flight Dest ------------------ 145 ORD 156 WAS 188 LAX

INTERSECT

The INTERSECT operator produces an output table that has rows that are common to both tables. For example, using the IN_USA and OUT_USA tables shown above, the following example returns rows that are in both tables:

proc sql; title 'Flights from Both IN_USA and OUT_USA'; select * from in_usa intersect select * from out_usa;

Flights from Both IN_USA and OUT_USA Flight Dest ------------------ 193 FRA 207 LON

sql-expression

Produces a value from a sequence of operands and operators.

operand operator operand

Arguments

operand

operator

Note: SAS functions, including summary functions, can stand alone as SQL expressions. For example

select min(x) from table ; select scan(y,4) from table ;

SAS Functions

PROC SQL supports the same SAS functions as the DATA step, except for the functions LAG, DIF, and SOUND. For example, the SCAN function is used in the following query:

select style, scan(street,1) format=. from houses;

See SAS Language Reference: Dictionary for complete documentation on SAS functions. Summary functions are also SAS functions. See summary-function on page 1121 for more information.

USER Literal

USER can be specified in a view definition, for example, to create a view that restricts access to those in the user s department. Note that the USER literal value is stored in uppercase, so it is advisable to use the UPCASE function when comparing to this value:

create view myemp as select * from dept12.employees where upcase(manager)=user;

This view produces a different set of employee information for each manager who references it.

Operators and the Order of Evaluation

The order in which operations are evaluated is the same as in the DATA step with this one exception: NOT is grouped with the logical operators AND and OR in PROC SQL; in the DATA step, NOT is grouped with the unary plus and minus signs.

Unlike missing values in some versions of SQL, missing values in SAS always appear first in the collating sequence. Therefore, in Boolean and comparison operations, the following expressions resolve to true in a predicate:

3>null -3>null 0>null

You can use parentheses to group values or to nest mathematical expressions. Parentheses make expressions easier to read and can also be used to change the order of evaluation of the operators. Evaluating expressions with parentheses begins at the deepest level of parentheses and moves outward. For example, SAS evaluates A+B*C as A+(B*C), although you can add parentheses to make it evaluate as (A+B)*C for a different result.

Higher priority operations are performed first: that is, group 0 operators are evaluated before group 5 operators. The following table shows the operators and their order of evaluation, including their priority groups.

Table 45.1: Operators and Order of Evaluation

Group

Operator

Description

( )

forces the expression enclosed to be evaluated first

1

case-expression

selects result values that satisfy specified conditions

2

**

raises to a power

 

unary +, unary -

indicates a positive or negative number

3

*

multiplies

 

/

divides

4

+

adds

 

ˆ’

subtracts

5

concatenates

6

<NOT> BETWEEN condition

See BETWEEN condition on page 1085.

 

<NOT> CONTAINS condition

see CONTAINS condition on page 1093.

 

<NOT> EXISTS condition

See EXISTS condition on page 1093.

 

<NOT> IN condition

See IN condition on page 1094.

 

IS <NOT> condition

See IS condition on page 1095.

 

<NOT> LIKE condition

See LIKE condition on page 1105.

7

=, eq

equals

 

=, ^=, < >, ne

does not equal

 

>, gt

is greater than

 

<, lt

is less than

 

>=, ge

is greater than or equal to

 

<=, le

is less than or equal to

 

=*

sounds like (use with character operands only). See Example 11 on page 1164.

 

eqt

equal to truncated strings (use with character operands only). See Truncated String Comparison Operators on page 1116.

 

gtt

greater than truncated strings

 

ltt

less than truncated strings

 

get

greater than or equal to truncated strings

 

let

less than or equal to truncated strings

 

net

not equal to truncated strings

8

&, AND

indicates logical AND

9

, OR

indicates logical OR

10

, ^, NOT

indicates logical NOT

Symbols for operators might vary, depending on your operating environment. See SAS Language Reference: Dictionary for more information on operators and expressions.

Truncated String Comparison Operators

PROC SQL supports truncated string comparison operators (see Group 7 in Table 45.1 on page 1115). In a truncated string comparison, the comparison is performed after making the strings the same length by truncating the longer string to be the same length as the shorter string. For example, the expression 'TWOSTORY' eqt 'TWO' is true because the string 'TWOSTORY is reduced to TWO before the comparison is performed. Note that the truncation is performed internally; neither operand is permanently changed.

Note: Unlike the DATA step, PROC SQL does not support the colon operators (such as =:, >:, and <=:) for truncated string comparisons. Use the alphabetic operators (such as EQT, GTT, and LET).

Query Expressions (Subqueries)

A query-expression is called a subquery when it is used in a WHERE or HAVING clause. A subquery is a query-expression that is nested as part of another query-expression. A subquery selects one or more rows from a table based on values in another table.

Depending on the clause that contains it, a subquery can return a single value or multiple values. If more than one subquery is used in a query-expression, then the innermost query is evaluated first, then the next innermost query, and so on, moving outward.

PROC SQL allows a subquery (contained in parentheses) at any point in an expression where a simple column value or constant can be used. In this case, a subquery must return a single value , that is, one row with only one column.

The following is an example of a subquery that returns one value. This PROC SQL step subsets the PROCLIB.PAYROLL table based on information in the PROCLIB.STAFF table. (PROCLIB.PAYROLL is shown in Example 2 on page 1141, and PROCLIB.STAFF is shown in Example 4 on page 1145.) PROCLIB.PAYROLL contains employee identification numbers (IdNumber) and their salaries (Salary) but does not contain their names. If you want to return only the row from PROCLIB.PAYROLL for one employee, then you can use a subquery that queries the PROCLIB.STAFF table, which contains the employees identification numbers and their names (Lname and Fname).

options ls=64 nodate nonumber; proc sql; title 'Information for Earl Bowden'; select * from proclib.payroll where idnumber= (select idnum from proclib.staff where upcase(lname)='BOWDEN');

Information for Earl Bowden Id Number Gender Jobcode Salary Birth Hired ---------------------------------------------------- 1403 M ME1 28072 28JAN69 21DEC91

Subqueries can return multiple values . The following example uses the tables PROCLIB.DELAY and PROCLIB.MARCH. These tables contain information about the same flights and have the Flight column in common. The following subquery returns all the values for Flight in PROCLIB.DELAY for international flights. The values from the subquery complete the WHERE clause in the outer query. Thus, when the outer query is executed, only the international flights from PROCLIB. MARCH are in the output.

options ls=64 nodate nonumber; proc sql outobs=5; title 'International Flights from'; title2 'PROCLIB.MARCH'; select Flight, Date, Dest, Boarded from proclib.march where flight in (select flight from proclib.delay where destype='International');

International Flights from PROCLIB.MARCH Flight Date Dest Boarded ------------------------------- 219 01MAR94 LON 198 622 01MAR94 FRA 207 132 01MAR94 YYZ 115 271 01MAR94 PAR 138 219 02MAR94 LON 147

Sometimes it is helpful to compare a value with a set of values returned by a subquery. The keywords ANY or ALL can be specified before a subquery when the subquery is the right-hand operand of a comparison. If ALL is specified, then the comparison is true only if it is true for all values that are returned by the subquery. If a subquery returns no rows, then the result of an ALL comparison is true for each row of the outer query.

If ANY is specified, then the comparison is true if it is true for any one of the values that are returned by the subquery. If a subquery returns no rows, then the result of an ANY comparison is false for each row of the outer query.

The following example selects all those in PROCLIB.PAYROLL who earn more than the highest paid ME3 :

options ls=64 nodate nonumber ; proc sql; title ''Employees who Earn More than''; title2 ''All ME's''; select * from proclib.payroll where salary > all (select salary from proclib.payroll where jobcode='ME3');

Employees who Earn More than All ME's Id Number Gender Jobcode Salary Birth Hired ------------------------------------------------------- 1333 M PT2 88606 30MAR61 10FEB81 1739 M PT1 66517 25DEC64 27JAN91 1428 F PT1 68767 04APR60 16NOV91 1404 M PT2 91376 24FEB53 01JAN80 1935 F NA2 51081 28MAR54 16OCT81 1905 M PT1 65111 16APR72 29MAY92 1407 M PT1 68096 23MAR69 18MAR90 1410 M PT2 84685 03MAY67 07NOV86 1439 F PT1 70736 06MAR64 10SEP90 1545 M PT1 66130 12AUG59 29MAY90 1106 M PT2 89632 06NOV57 16AUG84 1442 F PT2 84536 05SEP66 12APR88 1417 M NA2 52270 27JUN64 07MAR89 1478 M PT2 84203 09AUG59 24OCT90 1556 M PT1 71349 22JUN64 11DEC91 1352 M NA2 53798 02DEC60 16OCT86 1890 M PT2 91908 20JUL51 25NOV79 1107 M PT2 89977 09JUN54 10FEB79 1830 F PT2 84471 27MAY57 29JAN83 1928 M PT2 89858 16SEP54 13JUL90 1076 M PT1 66558 14OCT55 03OCT91

Note: See the first item in Subqueries and Efficiency on page 1119 for a note about efficiency when using ALL.

In order to visually separate a subquery from the rest of the query, you can enclose the subquery in any number of pairs of parentheses.

Correlated Subqueries

In a correlated subquery, the WHERE expression in a subquery refers to values in a table in the outer query. The correlated subquery is evaluated for each row in the outer query. With correlated subqueries, PROC SQL executes the subquery and the outer query together.

The following example uses the PROCLIB.DELAY and PROCLIB.MARCH tables. A DATA step ( PROCLIB.DELAY on page 1404) creates PROCLIB.DELAY. PROCLIB.MARCH is shown in Example 13 on page 1168. PROCLIB.DELAY has the Flight, Date, Orig, and Dest columns in common with PROCLIB.MARCH:

proc sql outobs=5; title 'International Flights'; select * from proclib.march where 'International' in (select destype from proclib.delay where march.Flight=delay.Flight);

The subquery resolves by substituting every value for MARCH.Flight into the subquery s WHERE clause, one row at a time. For example, when MARCH.Flight= 219 , the subquery resolves as follows:

  1. PROC SQL retrieves all the rows from DELAY where Flight= 219 and passes their DESTYPE values to the WHERE clause.

  2. PROC SQL uses the DESTYPE values to complete the WHERE clause:

    where 'International' in ('International','International', ...)

  3. The WHERE clause checks to see if International is in the list. Because it is, all rows from MARCH that have a value of 219 for Flight become part of the output.

The following output contains the rows from MARCH for international flights only.

Output 45.2: International Flights for March

International Flights Flight Date Depart Orig Dest Miles Boarded Capacity -------------------------------------------------------------------- 219 01MAR94 9:31 LGA LON 3442 198 250 622 01MAR94 12:19 LGA FRA 3857 207 250 132 01MAR94 15:35 LGA YYZ 366 115 178 271 01MAR94 13:17 LGA PAR 3635 138 250 219 02MAR94 9:31 LGA LON 3442 147 250

 

Subqueries and Efficiency

SUBSTRING function

Returns a part of a character expression.

SUBSTRING (sql-expression FROM start <FOR length >)

Details

The SUBSTRING function operates on character strings. SUBSTRING returns a specified part of the input character string, beginning at the position that is specified by start . If length is omitted, then the SUBSTRING function returns all characters from start to the end of the input character string. The values of start and length must be numbers (not variables) and can be positive, negative, or zero.

If start is greater than the length of the input character string, then the SUBSTRING function returns a zero-length string.

If start is less than 1, then the SUBSTRING function begins extraction at the beginning of the input character string.

If length is specified, then the sum of start and length cannot be less than start or an error is returned. If the sum of start and length is greater than the length of the input character string, then the SUBSTRING function returns all characters from start to the end of the input character string. If the sum of start and length is less than 1, then the SUBSTRING function returns a zero-length string.

Note: The SUBSTRING function is provided for compatibility with the ANSI SQL standard. You can also use the SAS function SUBSTR.

summary-function

Performs statistical summary calculations.

Restriction: A summary function cannot appear in an ON clause or a WHERE clause.

See also: GROUP BY on page 1079, HAVING Clause on page 1080, SELECT Clause on page 1072, and table-expression on page 1127

Featured in: Example 8 on page 1157, Example 12 on page 1166, and Example 15 on page 1174

summary-function (<DISTINCT ALL> sql-expression)

Arguments

summary-function

DISTINCT

ALL

sql-expression

Summarizing Data

Summary functions produce a statistical summary of the entire table or view that is listed in the FROM clause or for each group that is specified in a GROUP BY clause. If GROUP BY is omitted, then all the rows in the table or view are considered to be a single group. These functions reduce all the values in each row or column in a table to one summarizing or aggregate value. For this reason, these functions are often called aggregate functions . For example, the sum (one value) of a column results from the addition of all the values in the column.

Counting Rows

The COUNT function counts rows. COUNT(*) returns the total number of rows in a group or in a table. If you use a column name as an argument to COUNT, then the result is the total number of rows in a group or in a table that have a nonmissing value for that column. If you want to count the unique values in a column, then specify COUNT(DISTINCT column ).

If the SELECT clause of a table-expression contains one or more summary functions and that table-expression resolves to no rows, then the summary function results are missing values. The following are exceptions that return zeros:

See Example 8 on page 1157 and Example 15 on page 1174 for examples.

Calculating Statistics Based on the Number of Arguments

The number of arguments that is specified in a summary function affects how the calculation is performed. If you specify a single argument, then the values in the column are calculated. If you specify multiple arguments, then the arguments or columns that are listed are calculated for each row. For example, consider calculations on the following table.

proc sql; title 'Summary Table'; select * from summary;

Summary Table X Y Z ---------------------------- 1 3 4 2 4 5 8 9 4 4 5 4

If you use one argument in the function, then the calculation is performed on that column only. If you use more than one argument, then the calculation is performed on each row of the specified columns. In the following PROC SQL step, the MIN and MAX functions return the minimum and maximum of the columns they are used with. The SUM function returns the sum of each row of the columns specified as arguments:

proc sql; select min(x) as Colmin_x, min(y) as Colmin_y, max(z) as Colmax_z, sum(x,y,z) as Rowsum from summary;

Summary Table Colmin_x Colmin_y Colmax_z Rowsum --------------------------------------- 1 3 5 8 1 3 5 11 1 3 5 21 1 3 5 13

Remerging Data

When you use a summary function in a SELECT clause or a HAVING clause, you might see the following message in the SAS log:

NOTE: The query requires remerging summary statistics back with the original data.

The process of remerging involves two passes through the data. On the first pass, PROC SQL

On the second pass, PROC SQL retrieves any additional columns and rows that it needs to show in the output.

The following examples use the PROCLIB.PAYROLL table (shown in Example 2 on page 1141) to show when remerging of data is and is not necessary.

The first query requires remerging. The first pass through the data groups the data by Jobcode and resolves the AVG function for each group. However, PROC SQL must make a second pass in order to retrieve the values of IdNumber and Salary.

proc sql outobs=10; title 'Salary Information'; title2 '(First 10 Rows Only)'; select IdNumber, Jobcode, Salary, avg(salary) as AvgSalary from proclib.payroll group by jobcode;

Salary Information (First 10 Rows Only) Id Number Jobcode Salary AvgSalary ------------------------------------- 1704 BCK 25465 25794.22 1677 BCK 26007 25794.22 1383 BCK 25823 25794.22 1845 BCK 25996 25794.22 1100 BCK 25004 25794.22 1663 BCK 26452 25794.22 1673 BCK 25477 25794.22 1389 BCK 25028 25794.22 1834 BCK 26896 25794.22 1132 FA1 22413 23039.36

You can change the previous query to return only the average salary for each jobcode. The following query does not require remerging because the first pass of the data does the summarizing and the grouping. A second pass is not necessary.

proc sql outobs=10; title 'Average Salary for Each Jobcode'; select Jobcode, avg(salary) as AvgSalary from proclib.payroll group by jobcode;

Average Salary for Each Jobcode Jobcode AvgSalary ------------------ BCK 25794.22 FA1 23039.36 FA2 27986.88 FA3 32933.86 ME1 28500.25 ME2 35576.86 ME3 42410.71 NA1 42032.2 NA2 52383 PT1 67908

When you use the HAVING clause, PROC SQL may have to remerge data to resolve the HAVING expression.

First, consider a query that uses HAVING but that does not require remerging. The query groups the data by values of Jobcode, and the result contains one row for each value of Jobcode and summary information for people in each Jobcode. On the first pass, the summary functions provide values for the Number , Average Age , and Average Salary columns. The first pass provides everything that PROC SQL needs to resolve the HAVING clause, so no remerging is necessary.

proc sql outobs=10; title 'Summary Information for Each Jobcode'; title2 '(First 10 Rows Only)'; select Jobcode, count(jobcode) as number label='Number', avg(int((today()-birth)/365.25)) as avgage format=2. label='Average Age', avg(salary) as avgsal format=dollar8. label='Average Salary' from proclib.payroll group by jobcode having avgage ge 30;

Summary Information for Each Jobcode (First 10 Rows Only) Average Average Jobcode Number Age Salary ------------------------------------ BCK 9 36 ,794 FA1 11 33 ,039 FA2 16 37 ,987 FA3 7 39 ,934 ME1 8 34 ,500 ME2 14 39 ,577 ME3 7 42 ,411 NA1 5 30 ,032 NA2 3 42 ,383 PT1 8 38 ,908

In the following query, PROC SQL remerges the data because the HAVING clause uses the SALARY column in the comparison and SALARY is not in the GROUP BY clause.

proc sql outobs=10; title 'Employees who Earn More than the'; title2 'Average for Their Jobcode'; title3 '(First 10 Rows Only)'; select Jobcode, Salary, avg(salary) as AvgSalary from proclib.payroll group by jobcode having salary > AvgSalary;

Employees who Earn More than the Average for Their Jobcode (First 10 Rows Only) Jobcode Salary AvgSalary ---------------------------- BCK 26007 25794.22 BCK 25823 25794.22 BCK 25996 25794.22 BCK 26452 25794.22 BCK 26896 25794.22 FA1 23177 23039.36 FA1 23738 23039.36 FA1 23979 23039.36 FA1 23916 23039.36 FA1 23644 23039.36

Keep in mind that PROC SQL remerges data when

table-expression

Defines part or all of a query-expression.

See also: query-expression on page 1107

SELECT < DISTINCT > object-item <, object-item >

See SELECT Statement on page 1071 for complete information on the SELECT statement.

Details

A table-expression is a SELECT statement. It is the fundamental building block of most SQL procedure statements. You can combine the results of multiple table-expressions with set operators, which creates a query-expression. Use one ORDER BY clause for an entire query-expression. Place a semicolon only at the end of the entire query-expression. A query-expression is often only one SELECT statement or table-expression.

UPPER function

Converts the case of a character string to uppercase.

See also: LOWER function on page 1107

UPPER (sql-expression)

Details

The UPPER function operates on character strings. UPPER converts the case of its argument to all uppercase.

[*] Currently, there is no way to designate a WEIGHT variable for a table in PROC SQL. Thus, each row (or observation) has a weight of 1.

Категории