No DBMS product actually uses a CROSS JOIN to construct the intermediate results table—the product table would be too large even when the
in the FROM clause contains only a few rows. For example, if you have two 1,000-row tables, the resulting Cartesian product would be a table with 1,000,000 rows! The important thing to understand is that although it never really actually exists, the product table is a conceptual model that correctly describes the behavior of the DBMS when it executes a multi-table query.
- If there is a WHERE clause, the DBMS will apply its search condition to each row in the composite (Cartesian product) table generated in Step 1. The DBMS keeps those rows for which the search condition tests TRUE and removes those for which the search condition tests NULL or FALSE. If the WHERE clause contains a subquery, the DBMS executes the subquery on each row that satisfies the selection criteria of the main query.
- If there is a GROUP BY clause, the DBMS breaks the rows in the results table (from Step 2) into groups where columns in the all have the same value. Next, the DBMS reduces each group to a single row, which is then added to a new results table that replaces the one at the beginning of this step.
|
Note |
All of the columns in the in the GROUP BY clause must appear in the in the SELECT clause.
|
|
Note |
The DBMS treats NULL values is if they were equal and puts all of them into their own group.
|
- If there is a HAVING clause, the DBMS applies it to each of the rows in the "grouped" table produced in Step 3. The DBMS keeps those rows for which the tests TRUE and removes those rows for which the tests NULL (unknown) or FALSE. If the HAVING clause has a subquery, the DBMS performs the subquery on each row of the "grouped" table that satisfies the .
|
Note |
You cannot have a HAVING clause in a SELECT statement without a GROUP BY clause because the HAVING clause filters the results of the GROUP BY clause. Moreover, any columns in the must be included in the of the GROUP BY clause.
|
- Apply the SELECT clause to the results table. If a column in the results table is not in the , the DBMS drops the column from the results table. If the SELECT clause includes the DISTINCT option, the DBMS will remove duplicate rows from the results table.
|
Note |
The in the SELECT clause can consist of constants (numeric or character string), calculations based on constants or table columns, columns, and functions.
|
- If there is an ORDER BY clause, sort the results table as specified in the .
- Display the results table to the screen for an interactive SQL SELECT statement, or use a cursor to pass the results table to the calling (or host) program for programmatic SQL.
Suppose, for example, that you have two tables created with
CREATE TABLE employees
(emp_id INTEGER PRIMARY KEY,
last_name VARCHAR(25),
trainer VARCHAR(25),
sales INTEGER)
CREATE TABLE sales
(cust_id INTEGER PRIMARY KEY,
sold_by INTEGER,
sales_amt MONEY)
that have the following data:
EMPLOYEES table SALES table
id last_name trainer sales cust_id sold_by sales_amt
-- --------- ------- ----- ------- ------- ---------
1 Hardy Bob 3 1 1 $6,000
2 Wallace Greg 3 2 1 $6,000
3 Green Bob 2 3 4 $8,000
4 Marsh Andy 2 4 2 $4,000
5 Brown Greg 0 5 2 $6,000
6 3 $7'000
7 4 $4,000
8 1 $6,000
9 2 $7,000
10 3 $9,000
To get a report on how well your trainers were doing in preparing your salesmen for the field, you could use a query similar to:
SELECT last_name trainer, COUNT(*) AS num_trainees,
SUM(sales_amt) AS gross_sales, AVG(sales_amt)
FROM employees, sales
WHERE sales > 0
AND emp_id = sold_by
GROUP BY last_name, trainer
HAVING AVG(sales_amt) > 6000
After performing Step 1 of SELECT statement execution for the example query, the DBMS will create the following CROSS JOIN working table based on the data in the two tables (EMPLOYEES and ORDERS) listed in the example SELECT statement's FROM clause.
CROSS JOIN Working Table
EMPLOYEES ORDERS
emp_id last_name trainer sales cust_id sold_by sales_amt
------ --------- ------- ----- ------- ------- ---------
1 Hardy Bob 3 1 1 $6,000
1 Hardy Bob 3 2 1 $6,000
1 Hardy Bob 3 3 4 $8,000
1 Hardy Bob 3 4 2 $4,000
1 Hardy Bob 3 5 2 $6,000
1 Hardy Bob 3 6 3 $7,000
1 Hardy Bob 3 7 4 $4,000
1 Hardy Bob 3 8 1 $6,000
1 Hardy Bob 3 9 2 $7,000
1 Hardy Bob 3 10 3 $9,000
2 Wallace Greg 3 1 1 $6,000
2 Wallace Greg 3 2 1 $6,000
2 Wallace Greg 3 3 4 $8,000
2 Wallace Greg 3 4 2 $4,000
2 Wallace Greg 3 5 2 $6,000
2 Wallace Greg 3 6 3 $7,000
2 Wallace Greg 3 7 4 $4,000
2 Wallace Greg 3 8 1 $6,000
2 Wallace Greg 3 9 2 $7,000
2 Wallace Greg 3 10 3 $9,000
3 Green Bob 2 1 1 $6,000
3 Green Bob 2 2 1 $6,000
3 Green Bob 2 3 4 $8,000
3 Green Bob 2 4 2 $4,000
3 Green Bob 2 5 2 $6,000
3 Green Bob 2 6 3 $7,000
3 Green Bob 2 7 4 $4,000
3 Green Bob 2 8 1 $6,000
3 Green Bob 2 9 2 $7,000
3 Green Bob 2 10 3 $9,000
4 Marsh Andy 2 1 1 $6,000
4 Marsh Andy 2 2 1 $6,000
4 Marsh Andy 2 3 4 $8,000
4 Marsh Andy 2 4 2 $4,000
4 Marsh Andy 2 5 2 $6,000
4 Marsh Andy 2 6 3 $7,000
4 Marsh Andy 2 7 4 $4,000
4 Marsh Andy 2 8 1 $6,000
4 Marsh Andy 2 9 2 $7,000
4 Marsh Andy 2 10 3 $9,000
5 Brown Greg 0 1 1 $6,000
5 Brown Greg 0 2 1 $6,000
5 Brown Greg 0 3 4 $8,000
5 Brown Greg 0 4 2 $4,000
5 Brown Greg 0 5 2 $6,000
5 Brown Greg 0 6 3 $7,000
5 Brown Greg 0 7 4 $4,000
5 Brown Greg 0 8 1 $6'000
5 Brown Greg 0 9 2 $7,000
5 Brown Greg 0 10 3 $9,000
In Step 2 of the procedure, the DBMS will apply the search criteria in the SELECT statement's WHERE clause, which includes two predicates in the current example. The first predicate, sales > 0, will remove rows with a zero in the SALES column from the working table; the second predicate, emp_id = sold_by, will remove rows where the value in the EMP_ID column is not equal to the value in the SOLD_BY column to produce the working table:
CROSS JOIN Working Table After WHERE Clause Filter
EMPLOYEES ORDERS
emp_id last_name trainer sales cust_id sold_by sales_amt
------ --------- ------- ----- ------- ------- ---------
1 Hardy Bob 3 1 1 $6,000
1 Hardy Bob 3 2 1 $6,000
1 Hardy Bob 3 8 1 $6,000
2 Wallace Greg 3 4 2 $4,000
2 Wallace Greg 3 5 2 $6,000
2 Wallace Greg 3 9 2 $7,000
3 Green Bob 2 6 3 $7,000
3 Green Bob 2 10 3 $9,000
4 Marsh Andy 2 3 4 $8,000
4 Marsh Andy 2 7 4 $4,000
|
Note |
If any of the predicates in the WHERE clause use only one of the tables listed in the FROM clause, the DBMS optimizer will normally use the predicate to remove rows from the table prior to performing the CROSS JOIN. In the current example, the optimizer would have removed employee 5 from consideration since the employee has no sales, thereby eliminating 10 rows from the cross-joined working table.
|
In Step 3, the DBMS will use the GROUP BY clause to group the working table by trainer and then compute values for the aggregate functions in the SELECT clause.
CROSS JOIN Working Table After WHERE Clause Filtering and Grouping
EMPLOYEES ORDERS
emp_id last_name trainer sales cust_id sold_by sales_amt
------ --------- ------- ----- ------- ------- ---------
1 Hardy Bob 3 1 1 $6,000
1 Hardy Bob 3 2 1 $6,000
1 Hardy Bob 3 8 1 $6,000
3 Green Bob 2 6 3 $7,000
3 Green Bob 2 10 3 $9,000
2 Wallace Greg 3 4 2 $4,000
2 Wallace Greg 3 5 2 $6,000
2 Wallace Greg 3 9 2 $7,000
4 Marsh Andy 2 3 4 $8,000
4 Marsh Andy 2 7 4 $4,000
Aggregate Functions
trainer num_trainees gross_sales AVG(sales_amt)
------- ------------ ----------- -------------
Bob 5 $34,000 $6,800.0000
Greg 3 $17,000 $5,666.6666
Andy 2 $12,000 $6,000.0000
Next, the DBMS will apply the search condition in the HAVING clause to eliminate working table rows with average sales of $6,000.00 or less.
Finally, the DBMS will apply the in the SELECT clause to and display the query results in Step 7 as: trainer num_trainees gross_sales AVG(sales_amt) ------- ------------ ----------- ------------- Bob 5 34000.00 6800.0000 (1 Row(s) affected) Of course, the actual DBMS product will not create and drop actual, physical tables on disk as it works through the query—that would be very expensive in terms of system resources and processing time. The working tables in this tip model the way the DBMS executes a SELECT statement. If you conceptualize the interim table data as you add WHERE, GROUP BY, and HAVING clauses to your SELECT statements, you will often avoid the frustration of having to say, "Well, it is syntactically correct and it looks right, but it just doesn't give me the results I know I should get!" 88 Using the SELECT Statement to Display Columns from Rows in One or More Tables The simplest SQL query is a SELECT statement that displays the columns in a single table. For example, if you have a table created by CREATE TABLE customer (customer_id INTEGER PRIMARY KEY, first_name VARCHAR(20), last_name VARCHAR(30), address VARCHAR(50), phone_number VARCHAR(20)) you can display its contents using the SELECT statement: SELECT * FROM customer The asterisk (*) in the query's SELECT clause tells the DBMS to display all of the columns in the table(s) list in the FROM clause. As you learned in Tip 87, "Understanding the Steps Involved in Processing an SQL SELECT Statement," the syntax of the SELECT statement is: SELECT [ALL | DISTINCT] FROM
[WHERE ] [GROUP BY [HAVING ]] [ORDER BY ]
Thus, in the current example, the DBMS treats the asterisk (*) in the SELECT clause as if it were the list of all of the columns in the FROM clause table. As such, the SELECT statement in the current example
SELECT * FROM customer
is equivalent to
SELECT customer_id, first_name, last_name, address,
phone_number
FROM customer
If you want to display some (and not all) of the columns in a table, include only the columns you want to display in in the query's SELECT clause. For example, if you want to display only the customer's ID, first name, and phone number, use the SELECT statement: SELECT customer_id, first_name, phone_number FROM customer The DBMS will go through the target table named in the FROM clause (CUSTOMER, in the current example) one row at a time. As it reads each row in the input table, the DBMS will take the columns listed in the and use them to create a single row in a results table. Thus, the query
SELECT customer_id, first_name, last_name FROM customers
will produce a results table with four rows in the form:
customer_id first_name last_name
----------- ---------- ---------
1 Wally Cleaver
2 Dolly Madison
3 Horace Greely
4 Ben Stepman
All of the column names in the must be defined in the table (or tables) listed in the SELECT statement's FROM clause. For example, if you have two tables, CUSTOMERS and ORDERS, created by CREATE TABLE customers (cust_id INTEGER PRIMARY KEY, first_name VARCHAR(20) last_name VARCHAR(30)) CREATE TABLE ORDERS (order_number INTEGER PRIMARY KEY, order_date DATETIME, cust_id INTEGER, description VARCHAR(25), order_total MONEY) and you want to query the database for a list of the IDs and names of all of the customers, and the date, description, and total amount of all of their orders, you can use the SELECT statement: SELECT customers.cust_id, first_name, last_name, order_date, description, order_total FROM customers, orders WHERE customers.cust_id = orders.cust_id Where the column values are unique to each table listed in the FROM clause, the DBMS will know which column value to display from which table—that is, the columns FIRST_NAME and LAST_NAME appear only in the CUSTOMERS table, so the DBMS knows to get the values for the FIRST_NAME and LAST_NAME columns from the CUSTOMERS table when it builds a row in the results table. Similarly, the ORDER_DATE, DESCRIPTION, and ORDER_TOTAL columns appear only in the ORDERS table, so the DBMS knows to retrieve these values from the ORDERS table when it adds them to a row in the results table. When you want to display the value in columns that have the same name in more than one of the tables listed in the FROM clause, you must use the qualified column name (
.) to tell the DBMS which table's data to use. In the current example, CUST_ID appears in both the CUSTOMERS and ORDERS tables. As such, to display the value of the CUST_ID column from the CUSTOMERS table, you must type the name of the column as CUSTOMERS.CUST_ID in the query's SELECT clause.
Similarly, if the WHERE clause contains ambiguous column names, you need to use qualified column names in the WHERE clause as well, as is shown by the WHERE clause in the current example.
WHERE customers.cust_id = orders.cust_id
Using the SELECT Statement to Display Column and Computed Values
In addition to using a SELECT statement to display column values, you can use it to display calculated columns and literals. Suppose, for example, that you managed a restaurant and wanted to get a feel for how your patrons felt about the service based on their tips. Since most people tip the customary 15 percent if the service is "okay," you are most interested in tips above and below the customary amount.
Given a sales table created by
CREATE TABLE sales
(emp_id INTEGER,
meal_total MONEY,
tip_rec MONEY)
you can use the SELECT statement
SELECT emp_id, meal_total, '* 15% = ',
meal_total * .15 AS standard_tip, tip_rec,
tip_rec - (meal_total * .15) AS over_under
FROM sales
ORDER BY over_under
to produce a results table in the form:
emp_id meal_total standard_tip tip_rec over_under
------ ---------- ------- ------------ ------- ----------
1 75.3300 * 15% = 11.2995 10.5000 -.799500
2 13.5700 * 15% = 2.0355 1.2500 -.785500
5 89.2500 * 15% = 13.3875 13.5000 .112500
7 110.4800 * 15% = 16.5720 17.2500 .678000
1 125.4400 * 15% = 18.8160 19.7500 .934000
3 271.2200 * 15% = 40.6830 47.5000 6.817000
Data items shown in the EMP_ID, MEAL_TOTAL, and TIP_REC columns are examples of using the SELECT statement to display column values (which you learned about in Tip 88, "Using the SELECT Statement to Display Columns from Rows in One or More Tables"). The * 15% = column, meanwhile, shows how you can include a literal string in a SELECT statement to have it displayed as a column value in each row of the SELECT statement's results table.
|
Note |
Although literal strings can make each row in the results table read more like a sentence, the DBMS treats them as a new column with a constant value (* 15% =, in the current example). This distinction is important when you are passing results table data back to a host program by column position—literal strings in a SELECT statement add new columns that the host program "knows about" and handles appropriately.
|
Like the literal (constant) string column (* 15% =), the final two columns in the results table (STANDARD_TIP and OVER_UNDER) do not exist in the SALES table. The value of these columns in each results table row is the result of a computation on another column in the same row.
The example shows that you can both multiply a column value by a constant and use columns in place of all of the operands in mathematical expressions (such as subtraction and multiplication). In short, you can display the results of any mathematical operation or string manipulation function available in your DBMS implementation as a column value in the SELECT statement's results table.
|
Note |
When using column values in mathematical or string manipulation functions, you must pay careful attention to the column's data type. If you try to perform a mathematical operation (such as division) with a column of type CHAR or VARCHAR, the SELECT statement will fail to execute and the DBMS will display an error message.
|
If you need to mix data types, such as appending a numeric column onto a literal string, use one of the type conversion functions available to your DBMS implementation. MS-SQL server, for example, has the STR function for converting numbers to character strings, and the CONVERT function to convert character strings (with only numbers and a plus [+] or minus [-] sign) to numbers.
Using the SELECT Statement to Display All Column Values
As a convenience, SQL lets you abbreviate an "all columns" list to use in a SELECT statement with an asterisk (*). The SELECT statement
SELECT * FROM
tells the DBMS to list all of the columns and data values in the table named in
. As such, if you have the table created by
CREATE TABLE sales
(emp_id INTEGER,
meal_total MONEY,
tip_rec MONEY)
then the SELECT statement
SELECT * FROM sales
will yield results in the form:
emp_id meal_total tip_rec
------ ---------- ------
1 75.3300 10.5000
2 13.5700 1.2500
5 89.2500 13.5000
7 110.4800 17.2500
1 125.4400 19.7500
3 271.2200 47.5000
Although SQL-92 specifies that a SELECT statement can use either the all columns abbreviation (*) or a selection list, but not both, most commercial SQL products let you combine the two. MS-SQL Server, for example, treats the asterisk (*) as another element in the select list. As such, the SELECT statement
SELECT *, meal_total * .15 AS expected_tip FROM sales
WHERE NULL = NULL
yields the results:
emp_id meal_total tip_rec expected_tip
------ ---------- ------- ------------
(The NULL = NULL in the WHERE clause is a convenient way to get a column list without displaying data because NULL = NULL always evaluates to FALSE.)
While the all columns list selection is convenient for interactive session, you should avoid using it in programmatic SQL.
When an application sends a query to the DBMS, it expects the results to be returned in a fixed number of columns of specific data types. If the structure of the table is changed by rearranging its columns or by adding a column, the DBMS takes care of the database-related details, but it cannot update the application program. As a result, the program will fail to function because the output from the DBMS is no longer in the exact format in which the program expects to receive it.
Therefore, if you are using a program to submit queries and process results, list the column names instead of using the all columns list selector (*). By listing the columns, you can keep the number of columns and their arrangement in the query results table constant even after changes to the structure of the input table.
Using the SELECT Statement with a WHERE Clause to Select Rows Based on Column Values
Although the WHERE clause is optional, most SELECT statements have one. Using SELECT without a WHERE clause is useful for browsing the data in database tables-and little else. If you are gathering information you need for a decision, or if you are producing a report, you normally want to retrieve only some of the rows in one or more tables-that is where the WHERE clause comes in.
The WHERE clause lets you specify which of the rows in the input table(s) you want included in the SELECT statement's results table. For example, suppose you want to get a list of all salespeople who are below their quota of sales for the month. You could use the SELECT statement
SELECT emp_id, first_name, last_name,
quota - monthly_sales AS under_by
FROM employees
WHERE department = 'SALES' AND monthly_sales < quota
to produce a results table in the form:
emp_id first_name last_name under_by
------ ---------- --------- --------
1 Sally Fields 6
7 Wally Wells 9
9 Bret Maverick 12
A WHERE clause consists of the keyword WHERE, followed by the search condition that specifies the rows to be retrieved. In the current example, the WHERE clause specifies that the DBMS is to retrieve those rows in which the value in the DEPARTMENT column is SALES and the value of the MONTHLY_SALES column is less than the value in the QUOTA column.
When processing a SELECT statement with a WHERE clause, the DBMS works its way through the input table, applying the search condition to each row. It substitutes the column values from the table row for the column names in the WHERE clause. In the current example, the DBMS substitutes the value in the DEPARTMENT column for "department" in the WHERE clause, the value in the MONTHLY_SALES column for "monthly_sales," and the value in the table row's QUOTA column for "quota" in the WHERE clause.
After performing the substitutions, the DBMS evaluates the WHERE clause. Rows with column values for which the WHERE clause evaluates TRUE are included in the results table. Those rows whose column values cause the WHERE clause to evaluate FALSE or NULL are excluded from the results table.
As such, you can think of the WHERE clause as a filter. Rows that satisfy the search condition in the WHERE clause pass through the filter. Conversely, rows that do not satisfy the search clause get "stuck" in the filter and are excluded from the results table.
Using the SELECT Statement with a WHERE Clause to Select Rows Based on a Computed Value
In addition to selecting rows by comparing column values to a literal string or number (as you learned to do in Tip 91, "Using the SELECT Statement with a WHERE Clause to Select Rows Based on Column Values"), you can also use a WHERE clause to select rows by comparing column contents to computed values. As you've learned, the general format of a SELECT statement with a WHERE clause is:
SELECT FROM
WHERE
As such, to SELECT columns based on a computed value, simply use the computed value in the WHERE clause in search condition. For example, to list all salespeople whose sales are 20 percent or more below quota, you can use a SELECT statement similar to:
SELECT emp_id, first_name, last_name,
quota - monthly_sales AS under_by
FROM employees
WHERE department = 'SALES'
AND monthly_sales < (quota * .80)
When formulating search conditions with computed values, you are not limited to performing mathematical operations on the columns in a single table. In fact, you can base the selection on a computed value using columns from different tables. Suppose, for example, that you want a list of customers who have exceeded their credit limit, and you have INVOICES and CUSTOMERS tables created by:
CREATE customers
(customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(25),
last_name VARCHAR(30),
credit_limit MONEY)
CREATE invoices
(invoice_number INTEGER PRIMARY KEY,
ordered_by INTEGER,
invoice_total MONEY,
total_paid MONEY)
You can use the SELECT statement
SELECT customer_id, first_name, last_name, credit_limit
FROM customers, invoices
WHERE ordered_by = customer_id
AND credit_limit < (SELECT SUM(invoice_total - total_paid)
FROM invoices
WHERE ordered_by = customer_id)
to select customers who are currently over their credit limit.
In short, you can use a computed value in a WHERE clause as you would any other constant (literal string or number) or column value. The only restriction on using column values is that column names used to generate a computed value in a WHERE clause must appear in one of the tables named in the SELECT statement's FROM clause.
Using the SELECT Statement with a Comparison Predicate to Select Rows with Specific Column or Computed Values
You already learned how to use two of the six comparison operators to select rows when you used the equal to (=) and less than (<) comparison tests to select rows in Tip 91, "Using the SELECT Statement with a WHERE Clause to Select Rows Based on Column Values" and Tip 92, "Using a SELECT Statement with a WHERE Clause to Select Rows Based on a Computed Value." Table 93.1 lists the six SQL comparison operators. (There are seven entries in the table because some DBMS implementations use alternate forms for the "Not equal to" operator.)
Table 93.1: SQL Comparison Operators
Symbol
|
Meaning
|
=
|
Equal to
|
<>
|
Not equal to
|
!=
|
Not equal to
|
>
|
Greater than
|
<
|
Less than
|
>=
|
Greater than or equal to
|
<=
|
Less than or equal to
|
|
Note |
Table 93.1 includes only symbols that serve as operators. However, if you consider only the actions of a comparison operator, you could include the keyword LIKE to the list. As you will learn in Tip 261, "Using LIKE and NOT LIKE to Compare Two Character Strings," the LIKE keyword acts as a comparison operator when comparing character strings.
|
The equal (=) comparison operator retrieves rows in which a column is equal to a specific value or another column. For example, if active employees have an A in the status column, you could use
SELECT * FROM employees WHERE status = 'A'
to display all columns for active employee rows in the EMPLOYEES table.
Use the not equal to (<>) comparison operator to retrieve all rows except those that contain a specific value. For example, to display student information for all students except seniors, you might use the select statement:
SELECT * FROM students WHERE class <> 'Senior'
You can use the greater than (>) comparison operator to display rows in which the value of a column is more than that specified to the right of the operator. For example, if you want to list citations in which the driver was traveling more than 50 percent over the speed limit, you could use a SELECT statement similar to:
SELECT *, (speed - posted_limit) AS mph_over_limit
FROM citation WHERE speed > 1.5 * posted_limit
To list those rows where a column value is less than a specific value, use the less than (<) comparison operator. For example, to list the department number and budget of those departments that have spent less than their allotted budgets for the current year, you could use the statement:
SELECT dept_no, budget, total_spent,
budget - total_spent AS remaining
FROM dept_financials WHERE total_spent < budget
The greater than or equal to (>=) operator will return those rows with a column value at least as great as that listed in the WHERE clause. For example, to give the basketball recruiter a list of all seniors that are at least 6 feet, 6 inches tall (78 inches), you could use the SELECT statement:
SELECT student_id, first_name, last_name
FROM students WHERE class = 'Senior' AND height >= 78
Use the less than or equal to (<=) operator to select rows where a column's value is at most equal to that given in the WHERE clause. For example, suppose you want to give out "good attendance" awards to students who have never been absent and who have at most two tardies. You could use the SELECT statement:
SELECT student_id, first_name, last_name
FROM students WHERE absent_count = 0 AND late_count <= 2
|
Note |
Although the examples in this tip used numeric values with the comparison operators >=, <=, >, and <, you can use them to compare literal strings as well. For example to display an alphabetical listing of all employees with last names greater than or equal to King, you could use the SELECT statement:
SELECT employee_id, last_name
FROM employees_WHERE_last_name >= 'King' ORDER BY last_name
|
When performing a comparison operation on a column of type CHAR (or VARCHAR), the DBMS converts each letter in the strings to its binary representation. The DBMS then compares the bit strings to see which is greater. The binary value for the K (in King), for example, would be greater than the binary value of a name starting with J (and any letter appearing in the alphabet prior to J). As such, only names beginning with K through Z would appear in the list of employees.
If the first two letters of each string are the same, the DBMS compares the binary value of the next two letters, and then the next two until there are either no more letters to compare (in which case the strings are equal) or until a letter in one string has a different binary value than the letter in the other string (in which case one string will be either greater than or less than the other). (The case where one string is longer than the other is covered by the binary value of a letter from one string [a nonzero value] being compared to a different letter in a second string [the blank character whose binary value is less than all letters and numbers].)
Using Boolean Operators OR, AND, and NOT in a WHERE Clause
Boolean operators let you select rows using multiple search conditions in a single WHERE clause. When you want to select rows that satisfy any one of several search conditions, use an OR operator to combine the clauses into a compound search condition. Conversely, when you want to select rows that satisfy all of several search conditions, use an AND to combine search conditions in the WHERE clause. Finally, use the NOT operator to introduce a search condition in the WHERE clause when you want the DBMS to return rows that do not satisfy its criteria.
The general form of a SELECT statement including Boolean operator is:
SELECT FROM
WHERE [NOT]
[ [NOT] < search condition>]...
[ [NOT]
]
Suppose, for example, that you are shopping for a car and would find any one of several models acceptable. You could use the SELECT statement
SELECT year, make, model, cost FROM auto_inventory
WHERE make = 'Jaguar' OR make = 'BMW' OR make = 'Corvette'
to return a list of Jaguars, BMWs, and Corvettes in the AUTO_INVENTORY table. The OR operators used to combine the search conditions in the WHERE clause tell the DBMS to display the columns in the selection list for each row where any one (or more) of the search conditions evaluates to TRUE.
Use the AND operator to combine search conditions in a WHERE clause, if you want the DBMS to display only rows in which all (vs. any, for the OR operator) of the search conditions are TRUE. As such, the SELECT statement
SELECT year, make, model, cost FROM auto_inventory
WHERE make = 'Corvette' AND year > 1990 AND color = 'Red'
tells the DBMS you are interested only in rows where the MAKE column has the value Corvette, the YEAR column has a value greater than 1990, and the COLOR column has the value Red. If any one (or more) of the search conditions evaluates to FALSE, the DBMS excludes the row from the results table. Said another way, the AND operator tells the DBMS to include a row in the results table only when all of the search conditions in the WHERE clause evaluate to TRUE.
The NOT operator lets you select rows based on values not found in their columns. Suppose, for example, that your son or daughter just turned 16 and will accept any car, as long as the cost to insure it is not more than $2,000 per year. To get a list of acceptable vehicles from the AUTO_INVENTORY table, you could use the select statement:
SELECT year, make, model, cost FROM auto_inventory
WHERE NOT cost_to_insure > 2000
Although the previous examples used a single type of Boolean operator in each WHERE clause, you can combine multiple, different Boolean operators in the same WHERE clause. For example, suppose that you will accept any Jaguar with a model year after 1998, or any Corvette, or any BMW, as long it is not blue. You could use the SELECT statement:
SELECT year, color, make, model, cost FROM auto_inventory
WHERE (make = 'Jaguar' AND year > 1998) OR
(make = 'Corvette') OR
(make = 'BMW' AND NOT color = 'blue')
|
Note |
You can normally replace the comparison operator in a search condition instead of using a negated search condition-a search condition introduced by the NOT operator. For example, if your search condition reads "not less than," you could simply use the greater than or equal to (>=) operator. Similarly, if the search condition reads "not greater than," use the less than or equal to (<=) operator instead. The main reason for using the NOT operator is if you feel that it is visually easier to understand than the equivalent comparison operator. Someone not used to mathematical symbols or working with sets may not be familiar with the not equal to (<> or !=) operator and would find a WHERE clause in the form
WHERE NOT =
easier to read than
WHERE !=
or
WHERE <>
|
Using the ORDER BY Clause to Specify the Order of Rows Returned by a SELECT Statement
When the SELECT statement builds the results table in answer to a query, it does not arrange the rows displayed in any particular order. Normally, the DBMS displays query result in the order in which the selected rows were inserted into the input table. For efficiency, some DBMS products will use one of the indexes when reading through the rows in the input table. As such, the results table's rows may be arranged as they appear in the index used to traverse the input table.
If you want to control the order in which rows appear in the results table, add the ORDER BY clause to the SELECT statement.
The general form of a SELECT statement including an ORDER BY clause is:
SELECT FROM
[WHERE ]
ORDER BY
where is defined as:
[ASC | DESC]
[,...
[ASC | DESC]}
You can use any of the columns from the as part of the . Suppose, for example, that you want to list student grades from lowest to highest. You could use a SELECT statement similar to SELECT student_id, last_name, first_name, grade_received FROM students ORDER BY grade_received that will yield a results table in the form: student_id last_name first_name grade_received ---------- --------- ---------- -------------- 1 Smith Sally 65 8 Wells Wally 70 9 Luema Albert 75 12 Luema Abner 75 90 Davis Scott 96 If you omit the keywords ASC (ascending order) and DESC (descending order) from the sort specification, the DBMS will sort the rows in the results table in ascending order. As such, in the current example, the DBMS sorted the rows in the results table in ascending order based on the values stored in the GRADE_RECEIVED column. To sort the results table by more than one column, simply include all of the columns you want to use for the sort in the . (Remember, the only restriction on the columns listed in the is that they must all appear as column names in the SELECT statement's .)
Therefore, to sort the list of grades in descending order by GRADE_RECEIVED, then in ascending order by name, and finally in ascending order by STUDENT_ID, you could use the SELECT statement
SELECT student_id, last_name, first_name, grade_received
FROM students
ORDER BY grade_received DESC, last_name ASC, first_name,
student_id
which will yield a results table in the form:
student_id last_name first_name grade_received
---------- --------- ---------- --------------
90 Davis Scott 96
12 Luema Abner 75
9 Luema Albert 75
8 Wells Wally 70
1 Smith Sally 65
The second and subsequent columns listed in the ORDER BY clause act as "tie-breakers." If (as is the case in the current example) two rows have the same value in the first column listed in the ORDER BY clause (a 75 for GRADE_RECEIVED, in the current example), the DBMS will decide which of the two rows to display first by comparing the values in the column listed second in the ORDERED BY clause. If the second column is also identical (as is the case in the current example-both LAST_NAME columns have the value Luema), the DBMS will compare the values in the third column listed in the ORDER BY clause, and so on.
Notice that you can mix ascending and descending sort orders within the same ORDERED BY clause. Obviously, a single column must be arranged either in ascending or descending order-and not both. However, as is the case in the current example, if you list multiple columns in the ORDER BY clause, each of them can be displayed in either ascending (ASC) or descending (DESC) order without regard to the order (ASC or DESC) of the other columns in the clause.
In addition to using column names in the ORDER BY clause, you can refer to the items by number. For example, to sort the results in the results table of the previous example by GRADE_RECEIVED, LAST_NAME, FIRST_NAME, and STUDENT_ID, you could have used the ORDER BY clause in the SELECT statement SELECT student_id, last_name, first_name, grade_received FROM students ORDER BY 4 DESC, 2, 3, 1 instead of: SELECT student_id, last_name, first_name, grade_received FROM students ORDER BY grade_received DESC, last_name ASC, first_name, student_id The number of the column in the ORDER BY clause is determined by its position in the , not its position in the input table. As such, in the current example, you would refer to the GRADE_RECEIVED, the fourth item in the , as column 4, whether GRADE_RECEIVED is defined as the first, tenth, or fiftieth column in the STUDENTS table. You would use the column number in place of the column name in the ORDER BY clause when you want to sort by a computed column that does not exist in the input table and does not have a column name. For example, to display the list of salespeople in the EMPLOYEES table in order by the number of sales by which they exceeded (or missed) quota, you could use a SELECT statement similar to: SELECT employee_id, first_name, last_name, quota, sales, sales - quota FROM employees ORDER BY 6 DESC, last_name, first_name, employee_id The DBMS would return a results table in the form: emp_id first_name last_name quota sales ------ ---------- --------- ----- ------ 1 Sally Fields 3 7 4 7 Wally Wells 8 9 1 9 Bret Maverick 7 5 -2
|
Note |
If you add an AS clause to the sales-quota item in the select item list, you can use the name you give to the computed column in the ORDERED BY clause. For example, if you execute the select statement (that titles the SALES-QUOTA computed value as over_under)
SELECT employee_id, first_name, last_name, quota,
sales, sales - quota AS over_under
FROM employees
ORDER BY over_under DESC, last_name, first_name,
employee_id
the DBMS will display the results table:
emp_id first_name last_name quota sales over_under
------ ---------- --------- ----- ----- ----------
1 Sally Fields 3 7 4
7 Wally Wells 8 9 1
9 Bret Maverick 7 5 -2
|
Notice that the results table is still in the same order as before, but the computed column at the end of each row, now has a label: over_under.
Using Compound Conditions (AND, OR, and NOT) in a WHERE Clause to Select Rows Based on Multiple Column Values (or Computed Values)
Boolean operators AND and OR let you combine multiple, individual search conditions in a WHERE clause to form a compound search condition. The NOT operator, meanwhile, lets you negate the result of evaluating a search condition to tell the DBMS to select rows in which the search condition is FALSE (not TRUE).
Use the OR operator to combine search conditions when more than one may be TRUE but only one must be TRUE in order to SELECT the row for inclusion in the results table. For example, to generate a table of golfers eligible for the U.S. Open golf tournament, you could use the SELECT statement
SELECT first_name, last_name FROM golfers
WHERE previous_us_open_winner = 'Y' OR
PGA_tournaments_won > 1 OR
qual_school_ranking <= 10
to allow a golfer to play if the golfer were a previous U.S. Open winner or if the golfer won one of the other PGA tournaments, or if the golfer finished in the top 10 of the qualifying school.
When evaluating a compound search condition, the DBMS evaluates each individual search condition and then performs Boolean math to determine whether the overall WHERE clause evaluates to TRUE or FALSE. Thus, in the current example, the DBMS will evaluate each of the three search conditions (PREVIOUS_US_OPEN_WINNER='Y', PGA_TOURNAMENTS_WON > 1, and QUAL_SCHOOL_RANKING <= 10) and then use the OR Truth Table shown in Table 96.1 to determine whether the SELECT statement's WHERE clause is TRUE or FALSE.
Table 96.1: OR Truth Table
OR
|
TRUE
|
FALSE
|
NULL
|
TRUE
|
TRUE
|
TRUE
|
TRUE
|
FALSE
|
TRUE
|
FALSE
|
NULL
|
NULL
|
TRUE
|
NULL
|
NULL
|
(If the WHERE clause evaluates to TRUE, the DBMS includes the values from the current row in the results table; if the WHERE clause evaluates to FALSE or NULL, the DBMS excludes the values.) The AND operator lets you form a compound search condition by requiring that all of the individual search conditions listed in the WHERE clause be TRUE in order for the overall WHERE clause to evaluate TRUE. For example, if you require that salespeople who have sold more than $75,000 in goods and services have a cancellation rate below 10 percent and make more than 30 sales in order to be eligible for a bonus, you can use the SELECT statement SELECT employee_id, first_name, last_name, '$200' AS bonus FROM employees WHERE department = 'Sales' AND gross_sales > 75000 AND (cancellations / sales) < .1 AND sales > 30 to generate the bonus eligibility table. As was the case with the OR operator, the DBMS will evaluate the individual search conditions (there are four in the current example) and then use the AND Truth Table shown in Table 96.2 to determine whether the WHERE clause is TRUE or FALSE.
Table 96.2: AND Truth Table
AND
|
TRUE
|
FALSE
|
NULL
|
TRUE
|
TRUE
|
FALSE
|
NULL
|
FALSE
|
FALSE
|
FALSE
|
FALSE
|
NULL
|
NULL
|
FALSE
|
NULL
|
(If the WHERE clause evaluates to TRUE, the DBMS includes the values from the current row in the results table; if the WHERE clause evaluates FALSE or NULL, the DBMS excludes the values.) Finally, you can use the NOT operator to negate the Boolean (TRUE or FALSE) value of a search condition. For example, if you wanted to get a list of salespeople who either did not have gross sales of more than $75,000 or did not have more than 30 sales, you could use the SELECT statement: SELECT employee_id, first_name, last_name, gross_sales, sales FROM employees WHERE department = 'Sales' AND ((NOT gross_sales > 75000) OR (NOT sales > 30)) When determining the negated value of a search condition, the DBMS uses the logic given in the NOT Truth Table shown in Table 96.3.
Table 96.3: NOT Truth Table
NOT
|
TRUE
|
FALSE
|
NULL
|
|
FALSE
|
TRUE
|
NULL
|
Understanding NULL Value Conditions When Selecting Rows Using Comparison Predicates
When writing a search condition, be sure to take into account the handling of NULL values. Since the results table will include only those rows for which a search condition is TRUE, some table rows will remain "hidden" even if you display the rows that satisfy the search criteria and then display the rows that satisfy the negated search criteria.
Suppose, for example, that you have an employees table with the following values:
first_name last_name sales quota
---------- --------- ----- -----
Sally Fields 8 5
Wally Wells 4 10
Sue Smith 10 NULL
Kelly Sutherland 7 7
The SELECT statement
SELECT first_name FROM employees WHERE sales < quota
would produce a results table with one name: Wally.
Conversely, the SELECT statement
SELECT first_name FROM employees WHERE sales >= quota
would produce a results table with two names: Sally and Kelly.
By adding the rows in the two results tables (employees with sales below quota and employees with sales equal to or above quota), you might come to the mistaken conclusion that there are three employees in the EMPLOYEES table, when there are, in fact, four.
The reason that one employee remains "hidden" from view is that no matter which of the comparison operators (=, >, <, <>, >=, <=) you use, if one of the columns being compared is NULL, the value of the entire expression will be NULL. As such, the row will be excluded from the results table because the results table includes rows for which the search condition evaluates TRUE-FALSE and NULL (unknown) valuations are excluded. Thus, in the current example, employee Sue will never show up in the results table generated from a search condition using a comparison operator with the quota column (which is NULL for Sue) as an operand.
Even the SELECT statement
SELECT first_name FROM employees WHERE quota = NULL
will produce a results table with zero rows! (If any operand in an expression using a comparison operator is NULL, the DBMS evaluates the overall expression as NULL and excludes the row in question from the results table.)
|
Note |
To get a list of rows in which the value of a search condition is NULL, use the NULL value test IS NULL. As such, to get a list of employees with a NULL value in the QUOTA column, you could use the SELECT statement:
SELECT first_name FROM employees WHERE quota IS NULL
|
Using Row Value Expressions to Select Rows in a Table Based on Multiple Column Values
A row value constructor, as the name implies, is a list of literal values and expressions that, taken together, give the values for the columns in a single row in a table. You learned about row value constructors when you used them in INSERT statements to add rows to tables in Tip 67, "Using the INSERT Statement to Add Rows to a Table."
Similar to the row value constructor used in an INSERT statement, the row value constructor in a SELECT statement specifies the values in a row's columns. However, unlike the INSERT statement's row value constructor, the row value constructor in a SELECT statement is not used to add a row to a table. Rather, the row value constructor in a SELECT statement's WHERE clause is used to specify the value a row's columns must have in order for the row to be included in the SELECT statement's results table.
For example, to display active employees in the sales department, you could use a SELECT statement similar to:
SELECT employee_id, first_name, last_name, gross_sales
FROM employees
WHERE (status, department) = ('Active', 'Sales')
To process a select statement with a row value constructor, the DBMS goes through the input table one row at a time, substituting the value of each column named in the row value constructor for its column name on the left side of the equals (=) sign. The DBMS constructs a row from the substituted column values on the left side of the equals (=) sign and compares it to a row constructed from the literal values (or expressions) given in the value list on the right side of the row value constructor's equals (=) sign.
The comparison is accomplished by comparing each pair of columns in the two rows-that is, the first value in the list on the left side of the equals (=) sign is "paired" with and compared to the first value in the list on the right side of the equals (=) sign. The second value on the left side is compared to the second value on the right side, and so on.
A row value comparison for equality (such as that shown in the current example) is TRUE only if each of the "paired" columns has the same value.
|
Note |
Not all DBMS products support row value expressions in a SELECT statement's WHERE clause. If your DBMS does not, you can rewrite the row value expression using a WHERE clause with a compound search condition. For example, the SELECT statement in the current example can be rewritten as:
SELECT employee_id, first_name, last_name, gross_sales
FROM employees
WHERE status = 'Active' AND department = 'Sales'
|
As you learned in this tip, each column value in the row value constructor in the value list on the left side of the equals sign is paired with and compared to each literal value or expression in the value list on the right side of the equals sign. As such, you can reconstruct the row value equality as a set of "paired" column-to-value equality search conditions combined with the AND operator, since all (and not some) of the search conditions must evaluate TRUE in order for the WHERE clause to evaluate TRUE.
Understanding Subqueries
A subquery is a SELECT statement within a SELECT statement and is often referred to as either an inner query or an inner SELECT statement. Conversely, the SELECT statement that contains the subquery is called the outer query, outer SELECT, or main SELECT statement.
Subqueries are most often used in a SELECT statement's WHERE clause to generate a single-column virtual table that the main SELECT statement will use in determining which of the input table's rows to include in the results table. Suppose, for example, that you want a list of all products in the PRODUCTS table where quantity on unshipped orders in the ORDERS table is greater than the amount currently in stock.
You can use a SELECT statement in the form
SELECT product_id, description, qty_in_stock FROM products
WHERE qty_in_stock < "total to ship"
to generate the list of back-ordered items once you know the value of the total to ship.
A query in the form
SELECT SUM(quantity) FROM orders
WHERE orders.item_number = "item to total" AND
date_shipped IS NULL)
will return the quantity due to be shipped for each item whose item number you substitute for the item to total (given that unshipped orders have a NULL value in the DATE_SHIPPED column).
Adding the second SELECT statement as a subquery to the first, you get:
SELECT product_id, description, qty_in_stock FROM products
WHERE qty_in_stock < (SELECT SUM(quantity)
FROM orders
WHERE orders.item_number =
products.product_id AND
date_shipped IS NULL)
When executing the new SELECT statement with a subquery, the DBMS goes through each row in the PRODUCTS table and executes the subquery to determine the quantity of the product due to be shipped. Each subquery execution produces a virtual table containing a single value that the DBMS compares to the quantity of QTY_IN_STOCK. Whenever the quantity in stock (QTY_IN_STOCK) is less than the quantity due to be shipped (as returned by the subquery), the DBMS will include the values from the PRODUCT_ID, DESCRIPTION, and QTY_IN_STOCK columns of the current row of the PRODUCTS table in the results table.
The syntax of a subquery is:
(SELECT [ALL I DISTINCT]
FROM
[WHERE ]
[GROUP BY
[HAVING ]])
Thus, other than being enclosed in parenthesis ( ()), a subquery looks (and functions) exactly like any other SELECT statement. There are, however, a few rules a SELECT statement used as a subquery must follow:
-
- If the subquery is to provide a value to a comparison operator (=, <>, >, >=, <, <=) in a WHERE clause, the subquery most return a single value-that is, the subquery's results table must consist of a single column with at most one row. (A subquery with no rows evaluates to 0, vs. an error condition.)
-
- If the subquery is introduced by the keywords IN or NOT IN, the subquery must return a results table of one column; however, the results table may have more than one row.
-
- The subquery cannot have an ORDER BY clause. (Since the subquery results table is not displayed to the user and is used as the input table for its outer SELECT statement instead, ordering the rows in the subquery's results table is of no practical value.)
-
- Column names appearing in a subquery must be defined either in the tables listed in the subquery's FROM clause or in the tables listed in the outer SELECT statement's FROM.
-
- In most DBMS implementations, a subquery can consist of only a single SELECT statement and cannot consist of the UNION of several SELECT statements.
-
In this tip, you learned how to use a subquery to generate a (virtual) table of values that the DBMS used as the search criteria in the main SELECT statement's WHERE clause. Tip 336, "Understanding the Role of Subqueries in a HAVING Clause," will show you how to use a subquery to generate the filter values in a SELECT statement's HAVING clause.
Using Row Value Subqueries to Select Rows in a Table Based on Multiple Column Values
Row value subqueries are inner SELECT statements whose results tables (like the row value expressions you learned about in Tip 98, "Using Row Value Expressions to Select Rows in a Table Based on Multiple Column Values") consist of more than one column. In Tip 99, "Understanding Subqueries," you learned that one of the restrictions on a subquery used to provide a value to a comparison operator is that it must return a single column. However, this is true only because the majority of DBMS products do not support SQL-92 row value expressions.
One of the row value operations included in the SQL-92 standard is a row value comparison. If your DBMS supports row value comparisons, a subquery can return multiple columns of data that the DBMS will compare to the values in a row value constructor.
Suppose, for example, that you want a list of customer IDs from people who purchased the highest-priced car in your automobile inventory. You first need to submit a query such as
SELECT manufacturer, make, model FROM auto_inventory
WHERE sticker_price = (SELECT MAX (sticker_price)
FROM auto_inventory)
to determine the manufacturer, make, and model of the highest-priced car in the inventory. Then have the DBMS compare the MANUFACTURER, MAKE, and MODEL of each car purchased to the MANUFACTURER, MAKE, and MODEL of the highest-priced car, include the example SELECT statement as the subquery in the SELECT statement:
SELECT customer_id, order_date, price_paid
FROM auto_purchases
WHERE (manufacturer, make, model) =
(SELECT manufacturer, make, model FROM auto_inventory
WHERE sticker_price = (SELECT MAX (sticker_price)
FROM auto_inventory))
In the current example, the right side of the equals (=) sign in the WHERE clause contains a subquery that, when executed, will return three columns: the MANUFACTURER, MAKE, and MODEL of the highest-priced car in the automobile inventory. By comparing the subquery's results to the values in the row value constructor on the left side of the equals (=) sign, the DBMS can decide whether or not to include the CUSTOMER_ID, ORDER_DATE, and PRICE_DATE values from the AUTO_PURCHASES (input) table in the outer query's results table.
If the column values returned by the subquery on the right side of the equals (=) sign are equal to the column values in the row value constructer on the left side of the equals (=) sign, the DBMS will include the CUSTOMER_ID, ORDER_DATE, and PRICE_PAID from the current row of the AUTO_PURCHASES table in the SELECT statement's results table.
If your DBMS does not support row value constructors as operands in a comparison, you can break the multiple-column comparison down into single-column search conditions joined by AND operators. In the current example, you can rewrite the SELECT statement with a row value subquery as the SELECT statement:
SELECT customer_id, order_date, price_paid
FROM auto_purchases
WHERE manufacturer = (SELECT DISTINCT manufacturer
FROM auto_inventory
WHERE sticker_price =
(SELECT MAX (sticker_price)
FROM auto_inventory))
AND make = (SELECT DISTINCT make FROM auto_inventory
WHERE sticker_price =
(SELECT MAX (sticker_price)
FROM auto_inventory))
AND model = (SELECT DISTINCT model FROM auto_inventory
WHERE sticker_price =
(SELECT MAX (sticker_price)
FROM auto_inventory))
The query in this example uses three scalar subqueries to produce the same results table as that produced by the previous SELECT statement, which used a single row value constructor to make the same query.
|
Note |
A "scalar" subquery is one that returns a single value (that is, the subquery's results table consists of a single column and, at most, a single row). If the scalar query's results table has no rows, then its value is 0. The current example assumes that the maximum sticker price of each car is unique by manufacturer, make, and model. If more than one model, for example, has a sticker price equal to the maximum sticker price, the SELECT DISTINCT MODEL subquery will no longer return a scalar value, and the SELECT statement will fail to execute.
|
Comparing the length and structure of the last two SELECT statements in the current example, you can see that the SELECT statement with the row value subquery is more compact and easier to understand because it more closely follows the English-language description of the query.
Understanding Expressions
The SQL-92 standard defines four arithmetic operators you can use in expressions: addition (+), subtraction (-), multiplication (*), and division (/). Additionally, SQL-92 lets you use parenthesis (()) to combine operators and operands into complex expressions.
The order or precedence for mathematical operations is familiar: multiplication and division in the order in which they are found, and then addition and subtraction in the order in which they occur. Thus, to evaluate the expression: A + B / C - D * E, the DBMS would:
- Divide B by C
- Multiply D by E
- Add the result of Step 1 to A
- Subtract the result of 2 to the result of Step 3
Reading the equation in the example from left to right, you might think the DBMS would add A to B first and then divide the sum by C. However, the mathematical order of precedence dictates that the division and multiplication operations be performed first, followed by the addition and subtraction operations.
If you want to change the order of the operations or make an expression easier to read, use parenthesis to group its operations. For example, if you rewrite the expression A + B / C - D * E as A + (B / C) - (D * E), it is easier to see which operands and results will be affected by which mathematical operations. Moreover, if you want to change the order of operations so they occur from left to right (as you would read them), rewrite the expression as ((A + B) / C - D) * E.
In SQL, expressions serve three main functions:
- To calculate values to be returned in a SELECT statement's results table. For example, to list employees and the revenues they generate per hour they work and per dollar they are paid, you could use the SELECT statement:
SELECT employee_id, first_name, last_name, gross_sales,
gross_sales/ hours AS revenues_per_hour,
gross_sales / amount_paid AS
revenues_per_dollar_paid
FROM employees
- To calculate values used in search criteria. For example, to display those employees whose sales are more than 50 percent below quota, you could use the SELECT statement:
SELECT employee_id, first_name, last_name, quota, sales
FROM employees WHERE sales < .50 * quota
- To calculate values used to update data values in a table. For example, to increase the sales quota for all senior reps by 25 percent, you could use the UPDATE statement:
UPDATE employees SET quota = quota * .25
WHERE department = 'Marketing' and status = 'Senior'
|
Note |
Although SQL-92 specifies only four mathematical operators, your DBMS will most likely include trigonometric functions (SIN, COS, TAN, and so on), and other operators such as exponentiation, rounding, square roots, and so on. Check your system documentation for the full list of mathematical operations your DBMS supports.
|
Understanding Numeric Value Expressions
Numeric value expressions are expressions that apply one of the four SQL standard arithmetic operators (+, _, /, *) or additional (implementation-specific) operators and functions to produce a numeric results. To be a numeric expression, the operands and result must be numeric.
Individual operands in a numeric expression may be of different numeric data types. As such, you can mix numeric literals (integers and real numbers) and columns of any of the numeric data types (INT, SMALLINT, NUMERIC, DOUBLE PRECISION, REAL, and so on) in a single expression. The SQL-92 standard does not specify the data type that results from the various combinations of operand data types. As such, you will need to check your system manual to determine whether you get a real number or an integer when multiplying a real number by an integer, for example.
Examples of numeric expressions include:
-37.89
378.95 + 458.3 - 37.9
(59.6 * 10) / (58.3 / 4)
quota * .10
gross_sales - costs
Understanding String Value Expressions
A string value expression is one in which all of the operands are either literal character strings or SQL objects such as columns of data type CHARACTER, CAST expressions, CASE expressions, set functions, or scalar subqueries that return a result of type CHARACTER. SQL-92 defines only a single operator for use in a string value expression-concatenation, which lets you combine two character string operands into a single character string result. The concatenation operator attaches the characters in a second character string to the end of the characters in the first string.
|
Note |
Your DBMS product may include additional data types that are equivalent to data type CHARACTER and can therefore be used as operands in string value expressions. MS-SQL Server, for example, lets you abbreviate the data type CHARACTER as CHAR and includes a variable-length CHARACTER string type named VARCHAR. Check your system documentation for a list of the string (CHARACTER) data types available to your DBMS.
|
Thus, SQL-92 defines a string value expression as one or more operands of a character string data type concatenated together to produce a result that is also a character string. The simplest string value expression is a single string. More complex string value expressions involve concatenating operands of data type character using the concatenation operand-either a pair of vertical lines ( II ) or the plus (+) sign in some DBMS implementations.
Examples of string value expressions include:
'City'
'Peanut butter ' || 'and Jelly'
'Time' + 'shares'
first_name || ' ' || last_name
|
Note |
Although SQL-92 specifies concatenation as the only operation available for string value expressions, your DBMS product will most likely have additional functions and operators you can use for string manipulation. As an example, Table 103.1 lists some of the string functions provide by MS-SQL Server.
Table 103.1: Example MS-SQL Server String Manipulation Functions
String Function
|
Description
|
LOWER
|
Converts uppercase letters in a string to lower case
|
UPPER
|
Converts lowercase letters in a string to upper case
|
LTRIM
|
Removes leading spaces from a string
|
RTRIM
|
Removes trailing spaces from a string
|
CHARINDEX
|
Returns the starting position of the first character of one string in another
|
STR
|
Converts numeric data to a character string
|
STUFF
|
Inserts one character string into another
|
SUBSTRING
|
Returns a part of a character string
|
|
Check your system documentation for the string manipulation functions available to your DBMS.
Understanding Datetime Value Expressions
Use datetime value expressions to work with operands of type DATE, TIME, TIMESTAMP, or INTERVAL to produce a result that is also of one of the datetime data types (DATE, TIME, or TIMESTAMP).
Before working with datetime data objects, you will need to determine the proper data type names used by your DBMS implementation. MS-SQL Server, for example, uses TIMESTAMP to data type a column you want the DBMS to update with the current system date and time each time a row is inserted into the table. Conversely, DATETIME is MS-SQL Server's solitary datetime data type, and consists of the date and time elements defined for the SQL-92 standard TIMESTAMP data type. Thus, while some DBMS implementations let you define datetime objects as data type DATE, TIME, or TIMESTAMP, MS-SQL Server reserves the TIMESTAMP keyword for "other" purposes (not to define objects you can use in datetime value expressions) and has only the single DATETIME data type for use in defining all columns that will hold either dates or times, or both dates and times.
In addition to unique datetime types, each vendor's DBMS product has unique functions you can use in datetime value expressions. The only rules that all must (and do) follow is that the functions in a datetime expression must accept operands that are either an interval (integer value) or one of the datetime data types, and evaluating the datetime value expression must yield a result of type datetime.
MS-SQL Server datetime expressions include:
'06/09/2000'
'06/09/2000 14:52:52'
start_date + 7
DATENAME(weekday, start_date)
DATEDIFF(stop_date, start_date)
GETDATE()
DATEPART(day, (GETDATE())
The bottom line is that you must check your system documentation to see which datetime data types and functions are available for datetime value expressions in your DBMS.
Understanding SQL Predicates
The ANSI/ISO standard refers to search conditions as predicates. As such, you already know what predicates are and how to use them since you've seen predicates in tip examples with SQL statements that have a WHERE clause.
For example, the SELECT statement
SELECT student_id, first_name, last_name FROM students
WHERE gpa = 4.0
uses the comparison predicate gpa = 40 to tell the DBMS to "filter out" or SELECT those rows in the STUDENTS table that have a value of 4.0 in the GPA column.
Predicates are the portion of the WHERE clause (or search condition) that describes the data values of interest to the DBMS. Because SQL is a data sublanguage designed to let you describe the data you want but not how to retrieve and store it, you will find SQL predicates (which describe data to the DBMS) in almost all SELECT, UPDATE, and DELETE statements.
There are seven basic SQL predicate classes:
- Comparison predicates (=, <>, >, >=, <, <=), used to compare the value of two expression or table columns.
- Range test predicate (BETWEEN), used to check if the value of an expression or table column falls between two values, the upper and lower bounds of the range. The range includes the end points.
- Set membership predicate (IN), tests to see if the value of an expression or table column is found within a list (or set) of values.
- Pattern-matching predicate (LIKE), used to check whether the value of a column of type CHARACTER has a character string that matches a specified pattern of characters.
- NULL value predicate (IS NULL), used to check whether a column contains a NULL or unknown value. (Some DBMS products use the keyword ISNULL in place of IS NULL.)
- Quantifier predicates (ALL, SOME, ANY), which let you use a comparison predicate to compare the value of a column or expression to the values found in a set or list of values.
- Existence predicate (EXISTS), used in conjunction with a subquery to determine if a table contains any rows that meet the subquery's search condition(s).
When executing an SQL statement with a predicate, the DBMS performs actions on those table rows where the predicate evaluates TRUE, and takes no action on rows for which the predicate evaluates to FALSE or unknown.
Understanding SQL Logical Connectives AND, OR, and NOT
In Tip 105, "Understanding SQL Predicates," you learned that an SQL predicate lets you specify the conditions under which the DBMS is to perform an action on a specific row or set of rows in a table. For example, the predicate in the DELETE statement
DELETE customers WHERE customer_since_date < '01/01/1995'
will remove the table rows for all customers added prior to 1 January 1995. If your intent in executing the DELETE statement in the example is to delete old, inactive customers, you certainly don't want to remove customers that currently making purchases-especially those that have been doing business with you for more than five years. As such, you need to limit the action of the DELETE statement by adding an additional condition to the predicate: Delete only inactive customers.
Unfortunately, a single, simple predicate will not let you specify multiple search conditions-that is where the logical connectives (AND, OR, NOT) come in. By using one or more logical connectives, you can combine two or more simple predicates into a compound predicate. A compound predicate can have as many search conditions as you need to identify the table rows on which you want the DBMS to act.
In the current example, you can add the logical connective AND to have the DBMS eliminate only those customers added to the database prior to 1995 who have not placed an order since 1997, as follows:
DELETE customers
WHERE customer_since_date < '01/01/1995' AND
last_order_date < '12/31/1997'
The AND logical connective lets you specify multiple search conditions that must ALL evaluate TRUE in order for the DBMS to take action on the row being tested. In the current example, there are two search conditions, both of which must be true in order for the DBMS to DELETE the row being tested from the CUSTOMERS table.
If you have multiple search conditions and you want the DBMS to act on a row where any one or more of the search conditions evaluates to TRUE, use the logical connective OR to combine them. For example, if you want to remove all customers added to the system prior to 1995 and any who last ordered something prior to 1997 (regardless of when they were added to the database), you could use the DELETE statement:
DELETE customers
WHERE customer_since_date < '01/01/1995' OR
last_order_date < '01/01/1997'
Use the NOT logical connective to negate the result of evaluating a search condition. For example, if you want to remove all customers added after 1/1/1995, you could use the DELETE statement:
DELETE customers
WHERE NOT customer_since_date < '01/01/1995'
Although the examples used thus far in this tip use a single logical connective, you can use multiple AND, OR, and NOT connectives in the same compound predicate. For example, the DELETE statement
DELETE customers
WHERE customer_since_date < '01/01/1995' AND
last_order_date < '12/31/1997' OR
total_paid = 0.00 AND
NOT total_orders > 100000.00
will remove those rows where the customer was added prior to 1995, has not placed an order since 1997, or has never paid for an order-regardless of when the customer was added to the database as long as the total value of orders placed by the customer is less than $100,000.00.
|
Note |
The DELETE statement in the example will remove any customers added prior to 1995 who have not placed an order since 1997-even if the customers placed more than $100,000.00 in orders. The final AND in the example adds only the TOTAL_ORDERS > 100000.00 test to the TOTAL_PAID = 0.00 search condition.
|
Logical connectives, like mathematical operators, have a specific order of precedence. The DBMS applies the NOT first, next it applies ANDs, finally, it uses the ORs. Use parenthesis ( () ) if you want to change the order of evaluation.
For example, if you want to keep all customers that have placed more than $100,000.00 in orders-regardless of the total amount they actually paid, when they were added to the system, or whether they placed an order since 1997- rewrite the example DELETE statement as:
DELETE customers
WHERE (customer_since_date < '01/01/1995' AND
last_order_date < '12/31/1997' OR
total_paid = 0.00) AND
NOT total_orders > 100000.00
The parenthesis ( () ) tell the DBMS to evaluate the search conditions and apply the conditions inside the (innermost) parenthesis ( () ) first and work outward.
The reason why you would execute a DELETE statement based on the search conditions and logical connectives used in the example is not important. The important thing to know is that logical connectives let you combine multiple, simple (single search condition) predicates into one compound (multiple) search condition predicate that the DBMS will use to decide whether or not to take the action specified by the keyword at the start of the SQL statement.
Understanding Set (or Column) Functions
In addition to the data you can extract from the values in the individual rows of a table, you sometimes need information based on all of the values in a column taken as a set. SQL-92 has six set or column functions that let you do just that. (Set [or column] functions are sometimes called aggregate functions, since they "aggregate" or accumulate values stored in a column across multiple rows into a single value.)
The SQL Set (aggregate) functions are:
- COUNT(*), which returns the number of rows in a table that satisfy the search condition(s) in the WHERE clause
- COUNT(), which returns the number of (non-NULL) values in a column
- MAX, which returns the maximum value in a table column
- MIN, which returns the minimum value in a table column
- SUM, which returns the sum of the values in a column
- AVG, which returns the average of the values in a column
For example, to count the number of employees in an EMPLOYEE table, you could use the SELECT statement:
SELECT COUNT(*) FROM employees
Or, to display the count of employees who live in Nevada (state abbreviation NV), you could use the SELECT statement:
SELECT COUNT(*) FROM employees WHERE state = 'NV'
When you need the count of table rows that have non-NULL value in a particular column, use the SELECT () aggregate function in place of the SELECT(*) function. For example, the SELECT statement
SELECT COUNT(quota) FROM employees WHERE state = 'NV'
will display the number of employees from Nevada who have a non-NULL value in the QUOTA column. The difference between
SELECT COUNT(*) FROM employees WHERE state = 'NV'
and
SELECT COUNT(quota) FROM employees WHERE state = 'NV'
is that the second statement does not count any rows that have a NULL value in the field. As such, when you need a count of the rows in a table that meet a search criteria, use SELECT COUNT(*). When you want the count of only rows that meet a search criteria and that have a non-NULL value in a particular column, use a SELECT COUNT () statement.
To determine the minimum or maximum value in a column, use the MIN or MAX function. For example, to display the employee ID and name of the oldest employees, you could use the SELECT statement:
SELECT employee_id, first_name, last_name, age
FROM employees
WHERE age = (SELECT MAX(age) FROM employees)
Conversely, if you want a list of the youngest employees, use the MIN function in a SELECT statement, similar to:
SELECT employee_id, first_name, last_name, age
FROM employees
WHERE age = (SELECT MIN(age) FROM employees)
When you need to add up the values in a column, use the SUM function. For example, to display the total sales made in May 2000, you could use the SELECT statement:
SELECT SUM (order_total) FROM sales
WHERE date_sold >= '05/01/2000' AND
date_sold <= '05/31/2000'
In order to SUM the values in a column, the column must, of course, be defined as one of the numeric data types. Moreover, the result from adding up all of the values in a column must fall within the range of the data type. As such, if you attempt to determine the SUM of a column of type SMALLINT, the resulting total cannot be any larger than the upper limit of the SMALLINT data type.
The AVG function returns the average of the values found in the rows of a numeric data type column. For example, to display the average order cost in addition to the total sold for May 2000, add the AVG function to the previous SELECT statement:
SELECT SUM (order_total), AVG (order_total) FROM sales
WHERE date_sold >= '05/01/2000' AND
date_sold <= '05/31/2000'
|
Note |
NULLs have no determinable value-by definition, the value of a NULL column is "unknown." As such, any row with a NULL value is ignored by both the SUM and AVG functions and has no effect on the sum or average computed for a column.
|
Understanding the CASE Expression
As you learned in Tip 17, "Understanding the Difference Between SQL and a Programming Language," SQL is a data sublanguage, not a full fledged programming language. Because SQL was designed to let you specify the data you want and not how to get at the data, the original SQL specification did not include block (BEGIN, END) statements, conditional (IF) statements, branch (GOTO) statements, or loop (DO, WHILE, FOR) statements. To reduce its reliance on external source programs to manipulate intermediate query results, many DBMS vendors have added block and statement flow control add-ons to SQL. (MS-SQL Server extensions are called Transact-SQL, and Oracle extensions are found in SQL*PLUS and PL/SQL.) One common "programming language" structure, the CASE expression, has even found its way into the SQL-92 specification.
CASE expressions are similar in function to the IF-THEN-ELSE statements found in almost all programming languages. The CASE expression reduces SQL's reliance on external programs for processing results table data by giving the data sublanguage limited decision-making capability. As such, you no longer have to retrieve data and run a separate, external program to modify the output data (results table) if the desired, modified results can be determined from current row or aggregate data values and as long as those changes are triggered by the value of a column in the current table row.
For example, if you have an employees table created with
CREATE TABLE employees
(id CHAR(3)
name VARCHAR(35),
address VARCHAR(45),
phone_number CHAR (11),
department SMALLINT,
commission MONEY,
bonus_level VARCHAR(35),
total_sales MONEY,
hourly_rate MONEY,
sales_calls SMALLINT,
sales_count SMALLINT)
you could use the SELECT statement
SELECT id, name, department FROM employees
ORDER BY department
to display the table contents:
id name department
-- ----------------- ----------
3 William Silverman 1
4 Walt welinski 1
1 Carry Grant 2
2 Michael Lancer 2
5 Sally Fields 3
6 Walt Frazier 3
7 Melissa Gomez 4
If you wanted to convert department numbers to character (string) descriptions, you would have had to use an external program to convert the data prior to SQL-92. The SQL-92 CASE expression lets you modify the output on the fly based on the value of a column in a table row.
An SQL-92 CASE expression takes on one of two forms, depending on whether the CASE expression is simple or searched. A simple CASE expression is based on a straight equality between the value that follows the keyword CASE and the value that follows each keyword THEN in the CASE expression. Meanwhile, the searched CASE expression has a comparison operator in an expression right after the keyword CASE.
The syntax of a simple CASE expression based on the (which follows the keyword CASE) being equal to the that follows each WHEN keyword is:
CASE
WHEN THEN | NULL
[WHEN THEN | NULL]...
[WHEN THEN | NULL]
[ELSE | NULL]
END
Thus, to convert department numbers to literal strings in the results table of the current example, you could use the SELECT statement
SELECT id, name, CASE department
WHEN 1 THEN 'Marketing'
WHEN 2 THEN 'Customer Service'
WHEN 3 THEN 'Collections'
WHEN 4 THEN 'Customer Relations'
END AS dept_name
FROM employees
ORDER BY dept_name
to produce the results table:
id name dept_name
-- ----------------- ------------------
5 Sally Fields Collections
6 Walt Frazier Collections
7 Melissa Gomez Customer Relations
1 Carry Grant Customer Service
2 Michael Lancer Customer Service
3 William Silverman Marketing
4 Walt Welinski Marketing
The syntax of a searched case expression is
CASE WHEN THEN | NULL
[WHEN THEN | NULL]...
[WHEN THEN
| NULL]
[ELSE | NULL]
END
and you can rewrite the simple CASE expression in the current example as a searched CASE expression, as follows:
SELECT id, name,
CASE WHEN department = 1 THEN 'Marketing'
WHEN department = 2 THEN 'Customer Service'
WHEN department = 3 THEN 'Collections,
WHEN department = 4 THEN 'Customer Relations'
END AS dept_name
FROM employees
ORDER BY dept_name
Although you can use a searched CASE expression to check for a column's contents being equal to a specific value, the real power of the searched CASE expression is that it lets you use subqueries and comparison operators in search conditions that test for something other than equality. Suppose, for example, that you assign your salespeople to bonus pools with increasing commission scales based on their total sales. You can use the SELECT statement with the searched CASE expression
SELECT id, name, total sales,
CASE WHEN total_sales < 10000) THEN 'Rookie'
WHEN (total_sales >= 10000) AND
(total_saIes < 100000) THEN 'Associate'
WHEN (total_sales >= 100000) AND
(total_sales < 1000000) THEN 'Manager'
WHEN (total_sales >= 1000000) THEN
'Vice President'
END
FROM employees
ORDER BY name, id
to display the ID, name, and bonus pool name based on the bonus range into which the salesperson's TOTAL_SALES falls.
Using the CASE Expression to Update Column Values
An SQL CASE expression differs from the CASE statements you find in most programming languages because it can be used only as part of an SQL statement and is not a statement in its own right. Said another way, SQL CASE expressions can appear in SQL statements almost anywhere a value is legal. However, SQL CASE expressions, unlike programming language CASE statements, cannot stand alone.
Tip 108 "Understanding the CASE Expression," showed you how to use the CASE expression in a SELECT statement to change the values in the results table based on the value of a column in one of the input tables. As you know, the "results table" is not a physical table on disk. Rather, it is a conceptual table that correctly models the way in which a SELECT statement returns the results of a query and how the DBMS can use those query results as an "input" or "target" table for another SQL statement.
In addition to altering the virtual results table, a CASE expression can modify the data stored in a physical table if the CASE expression appears in an UPDATE (vs. a SELECT) statement. For example, if you want to update the BONUS_LEVEL column in the employees record (vs. the values displayed in the virtual results table, as shown in Tip 108), you could use an UPDATE statement with a CASE expression similar to:
UPDATE employees
SET bonus_level =
CASE WHEN total_sales < 10000 THEN 'Rookie'
WHEN (total_sales >= 10000) AND
(total_sales < 100000) THEN 'Associate'
WHEN (total_sales >= 100000) AND
(total_sales < 1000000) THEN 'Manager'
WHEN (total_sales >= 1000000) THEN 'Vice President'
END
When processing a CASE expression, the DBMS takes the column values in a row that satisfies the SQL statement's search condition and uses them to determine if the first condition in the CASE expression is TRUE. If so, the CASE expression receives the value in the first THEN part. If not TRUE, the DBMS evaluates the second WHEN (search) condition in the CASE expression. If TRUE, the CASE expression receives the value in the second THEN part. If not TRUE, the DBMS tests the third search condition, and so on.
If none of the CASE expression's search conditions evaluates to TRUE, the CASE expression receives the value given in the CASE expression's optional ELSE clause. A CASE expression that has no ELSE clause returns a NULL if none of its SEARCH conditions evaluate to TRUE.
When used in an UPDATE statement, the value returned by a CASE expression (and subsequently used to update a table column) must be of the same data type as the column to be updated. As such, if you have a literal (character) string in the CASE expression's THEN clause, the DBMS will return an error if you are attempting to update a numeric type data column.
In addition to using literal strings, you an use column names and mathematical expressions to compute the value returned by a CASE expression. For example, once the BONUS_LEVEL column in the EMPLOYEES table has a correct value, you can use the UPDATE statement
UPDATE employees
SET commission =
CASE bonus_level
WHEN 'Rookie' THEN total_sales * .01
WHEN 'Associate' THEN total_sales * .05
WHEN 'Manager' THEN total_sales * .15
WHEN 'Vice President' THEN total_sales * .25
END
to set the value of the COMMISSION column based on the commission level associated with the employee's BONUS_LEVEL and TOTAL_SALES.
Using the CASE Expression to Avoid Error Conditions
In addition to modifying results table values and updating data values in physical tables, you can use a CASE expression to avoid computations that are mathematically invalid or that would violate data type range constraints.
For example, if you want to display the percentage of times that a salesperson's calls result in a sale, you might use the SELECT statement:
SELECT id, name, (sales_calls / sales_count) * 100.00
AS closing_percentage
FROM employees
WHERE department = 1
-
However, the DBMS will raise an error (and halt statement execution) if the value of SALES_COUNT is NULL or 0 (since dividing by zero is an illegal mathematical operation).
You can avoid the mathematical exception raised by dividing by zero or a NULL value by adding a CASE expression to the SELECT statement. For example, the SELECT statement
SELECT id, name, CASE WHEN sales_count > 0 THEN
sales_calls / sales_count * 100.00
ELSE 0
END AS closing_percentage
FROM employees
WHERE department = 1
will prevent the system from trying to divide SALES_COUNT by a zero or a NULL value. When SALES_COUNT is greater than 0, the DBMS will perform the division of the SALES_COUNT into the SALES_CALLS. Conversely, when the SALES_COUNT is 0 or unknown, the DBMS will skip the division and return a value of 0.
Sometimes mathematically legal computations can result in results that violate range constraints. Suppose, for example, that you deduct the cost of health insurance from an employee's GROSS_PAY using the UPDATE statement:
UPDATE payroll_records
SET net_pay = gross_pay - health_ins_deduction
You would violate a user-defined data range constraint if the amount of the HEALTH_INS_DEDUCTION is greater than the GROSS_PAY. After all, the NET_PAY amount on a check cannot be less than zero!
To avoid trying to write a check for a negative amount, you could add a CASE expression such as
UPDATE payroll_records
SET net_pay = CASE WHEN gross_pay >= health_ins_deduction
Then gross_pay - health_ins_deduction
ELSE gross_pay
END
to the UPDATE statement. In the current example, the DBMS would "take" the HEALTH_INS_DEDUCTION out of the GROSS_PAY only if the employee earned at least the amount of money due to be deducted.
Understanding the NULLIF Expression
The NULLIF function is the inverse of the ISNULL function you learned about in Tip 31, "Understanding the MS-SQL Server ISNULL() Function." While the ISNULL function is used to replace a NULL with a non-NULL value, the NULLIF is used to replace a non-NULL value with a NULL.
Suppose, for example, that someone set the value of the SALES_QUOTA column in the EMPLOYEES table to -1 for new employees who have not yet been assigned a quota. The NULLIF expression in the SELECT statement
SELECT employee_id, first_name, last_name,
NULLIF (SALES_QUOTA, -1) as Quota
FROM employees
lets you more accurately display the sales quotas in a virtual results table by replacing any values of -1 in the SALES_QUOTA column of the results table with a NULL
The syntax for the NULLIF expression is:
NULLIF (, )
When evaluating a NULLIF expression, the DBMS will return a value of NULL when and have the same value. If and have different values, the DBMS will return the value of as the value of the expression. Thus, in the current example, the NULLIF expression will evaluate to NULL whenever the value in the SALES_QUOTA column is equal to -1. Otherwise, the NULLIF expression will return the value in the SALES_QUOTA column of the current row.
Either one (or both) of the expressions in the NULLIF can be literals (numeric, datetime, or character constants); numeric, string, datetime or CASE expressions; or column names. However, using a literal for both and is of little practical value. For example, the SELECT statement
SELECT
NULLIF ('match', 'match'), NULLIF ('no match', 'match')
which generates the results table
----- -------
NULL, no match
tells you only that the NULLIF expression for two identical literals evaluates to NULL, while the NULLIF expression for two different literals returns the value of the first literal.
Besides altering the contents of a SELECT statement's virtual results table, you can use a NULLIF expression in an UPDATE statement to change the values in an actual (physical) table. For example, suppose that you close one of your sales offices, office 6. You can use the UPDATE statement
UPDATE employees SET office = NULLIF (office, 6)
to put a NULL in the OFFICE column of all employees previously assigned to office 6.
Using the COALESCE Expression to Replace NULL Values
A COALESCE expression gives you a simple way to replace NULL (or missing) data with non-NULL values Although you can use a CASE expression to do the same job, the syntax of the COALESCE expression
COALESCE (,
[,...])
is compact and perhaps easier to read (and understand) than the equivalent CASE expression.
The DBMS sets the value of the COALESCE expression to the first non-NULL value in its expression list. As shown by the expression's syntax diagram, the COALESCE expression, unlike the ISNULL expression (which is also used to replace a chosen expression with NULL values) can more than two expressions in its expression list.
To compute the value of a COALESCE expression, the DBMS starts by evaluating the first expression in the list. If the first expression evaluates to a non-NULL result, the DBMS returns its value as the value of the COALESCE expression. If the first expression evaluates to NULL, the DBMS evaluates the second expression in the list. If the second expression evaluates to a non-NULL result, the DBMS returns the value of the second expression as the COALESCE expression's value. If the second expression also evaluates to NULL, the DBMS goes on to evaluate the third expression, and so on.
In the end, the DBMS returns the first non-NULL value in the list of expressions, reading from left to right, as the value of the COALESCE expression. If all of the expressions in the expression list evaluate to NULL, the COALESCE expression returns a NULL value.
As an example of how the DBMS computes the value of COALESCE expression, assume that you want to list the appointment quotas for your sales representatives. Moreover, assume that you want to use the minimum appointment quota in the employee table as the quota for any rep that has not yet been assigned an appointment quota. To produce the list, you could execute the SELECT statement:
SELECT employee_id, first_name, last_name,
COALESCE (appt_quota,
(SELECT MlN(appt_quota) FROM employees), 0) AS quota
FROM employees
WHERE department = 'Marketing'
After evaluating the COALESCE expression, the DBMS displays the value in the APPT_QUOTA column if it is not NULL. If the APPT_QUOTA is NULL, the DBMS evaluates the MIN aggregate function on the APPT_QUOTA column and displays the minimum appointment quota as the employee's quota-as long as at least one marketing representative has a quota. If all of the quotas are NULL, the MIN aggregate function will return a NULL and the DBMS will display the final value in the expression list, 0, as the employee's appointment quota.
Using the COUNT(*) Aggregate Function to Count the Number of Rows in a Table
COUNT(*) is an aggregate function that returns the number of rows that satisfy the search criteria in the SELECT statement's WHERE clause. As such, the SELECT statement
SELECT COUNT(*) FROM employees
will display the numbers of rows in the EMPLOYEES table. Similarly, the UPDATE statement
UPDATE managers SET employees_managed =
(SELECT COUNT(*) FROM employees
WHERE manager = managers.employee_id)
will set the EMPLOYEES_MANAGED column in the MANAGERS table to the number of rows in the employees table where the value of the MANAGER column (in the EMPLOYEES table) is equal to the value of the EMPLOYEE_ID column in the MANAGERS table. Thus, the UPDATE statement in the example tells the DBMS to set the EMPLOYEES_MANAGED column in each row of the MANAGERS table to the count of the employees that the manager manages.
|
Note |
If the COUNT(*) function's SELECT statement has no WHERE clause, then all of the rows in the table satisfy the "search condition." As such, the SELECT statement in the first example will return the count of all rows in the table because every row in the table satisfies the SELECT statement's omitted search condition.
|
If the DBMS stores the number of table rows in its system tables, COUNT(*) will return the row count of even large tables very quickly because the DBMS can retrieve the row count directly from the system table (and doesn't have to read through and count the rows in the physical table). On those systems that do not maintain row counts in the system tables, you may be able to count table rows more quickly by using the COUNT() function with an indexed NOT NULL constrained column as a parameter.
When counting the rows in a table by using the COUNT() function instead of the COUNT(*) function, bear in mind that the value returned by the COUNT() function is equivalent only to the result of executing the COUNT(*) function if the column you pass to COUNT() has no NULL values in any of its rows.
For example, the COUNT() function in the SELECT statement
SELECT COUNT (employee_id) FROM employees
will return only the same value as the COUNT(*) function in the SELECT statement
SELECT COUNT(*) FROM employees
only if none of the rows in the EMPLOYEES table has a NULL value in the EMPLOYEE_ID column.
The value returned by the COUNT() function is best described as the count of non-NULL values in the column passed to the function. COUNT(*), on the other hand, is most accurately defined as an aggregate function that returns the count rows in a table. Therefore, since COUNT() returns the correct count of rows in a table only if none of the values in the column passed as a parameter is NULL, use COUNT() in place of COUNT(*) only if the column parameter is restricted by a NOT NULL constraint.
Using the COUNT(*) Aggregate Function to Count the Number of Data Values in a Column
While the purpose for using the COUNT(*) function is to count the number of rows in a table, the COUNT() function is used to count the number data values in a column. Unlike the COUNT(*) function, which ignores all column values, the COUNT() function checks the value of one (or more) columns and counts only those rows in which the value is not NULL.
For example, given the following data values in an EMPLOYEES table
employee_id first_name last_name quota manager
----------- --------- --------- ----- -------
1 Lancer Michael 5 NULL
2 Michael Lancer 5 1
3 William Silverman NULL 2
4 Walt Wellinski 8 1
5 William Silverman 8 2
6 NULL Gomez 10 2
7 Walt Frazier 10 NULL
the SELECT statement
SELECT COUNT(*) AS Row_Count,
COUNT(last_name) AS Last_Name_Count,
COUNT(manager) AS Manager_Count
FROM employees
will return the results table:
Row_Count Last_Name_Count Manager_Count
--------- --------------- -------------
7 7 5
Thus, in the current example, the COUNT(*) function returns the number of rows in the EMPLOYEES table-without regard for the values in any of the table's columns. Conversely, the two COUNT() functions return the number of rows with non-NULL values in columns passed (LAST_NAME and MANAGER) as the parameter to each function.
Since the LAST_NAME column has no NULL values, the COUNT(last_name) function returns the same value as the COUNT(*) function. Conversely, two of the table rows have a NULL in the MANAGER column, which causes the COUNT(manager) function to return the value 5, which is 2 less than the total number or rows in the EMPLOYEES table as reported by the COUNT(*) function.
The syntax of the COUNT() function is:
COUNT([ALL | DISTINCT] )
Because the DBMS defaults to ALL when neither ALL nor DISTINCT is specified, the ALL qualifier is normally omitted when using the COUNT() function. Tip 115, "Using the COUNT(*) Aggregate Function to Count the Number of Unique and Duplicate Values in a Column," will show you how to use the DISTINCT qualifier in the COUNT() function to count the number of unique data values in a column.
The important thing to understand is that the COUNT() function will accept a column name, a literal string, a numeric value, or an expression that combines column contents into a single value. As such, in the current example, you can use the SELECT statement
SELECT COUNT(first_name + last_name) FROM employees
to count the number of rows in which both the FIRST_NAME and LAST_NAME columns are not NULL.
|
Note |
If your DBMS implementation uses the double vertical lines (||) operator instead of the plus (+) operator for string concatenation, you would write the query as:
SELECT COUNT(first_name || last_name) FROM employees
|
If you wanted to count only the number of non-NULL data values in a column based on rows that satisfy a search condition, add a WHERE clause to the SELECT statement. For example, if you want to know only the number of non-NULL values in the FIRST_NAME column for table rows with a 2 in the MANAGER column, you could use the SELECT statement:
SELECT COUNT(first_name) FROM employees WHERE manager = 2
|
Note |
The syntax of the COUNT() function lets you pass a character string or numeric literal (string or numeric constant) as the . Using a literal value (vs. a column name or expression involving a column) will always cause the COUNT() function to return the number of rows in the table. After all, a non-NULL constant value will never be NULL as the DBMS evaluates the function for each row of the table. As such, the DBMS will count each row as having a non-NULL value for the "column" (the constant value) it is to test. Thus, executing the SELECT statement
SELECT COUNT('constant value') FROM employees
for example, will always return the number of rows in EMPLOYEES, since the literal string "constant value" will be non-NULL for each row in the table.
|
Using the COUNT(*) Aggregate Function to Count the Number of Unique and Duplicate Values in a Column
If you insert the DISTINCT constraint ahead of the expression passed as a parameter to the COUNT() function, the function will count the number of unique, non-NULL data values in a column. For example, given the following data values in an EMPLOYEES table
employee_ID first_name last_name quota manager
----------- ---------- --------- ----- -------
1 Lancer Michael 5 NULL
2 Michael Lancer 5 1
3 William Silverman NULL 2
4 Walt Wellinski 8 1
5 William Silverman 8 2
6 NULL Gomez 10 2
7 Walt Frazier 10 NULL
the SELECT statement
SELECT COUNT(first_name) AS Total_First_Names,
COUNT(DISTINCT first_name) AS Unique_First_Names
FROM employees
will return the results table:
Total_First_Names Unique_First_Names
----------------- ------------------
6 4
|
Note |
A simple way to display the number of unique rows in a table is to pass the table's PRIMARY KEY to the COUNT() function because the table's PRIMARY KEY column contains a unique, non-NULL value in every row of the table.
|
You can also count the number of unique values across multiple columns by using the string concatenation function to combine column values into a single, composite value (which is then passed as the parameter to the COUNT() function).
Suppose, for example, that you want to count the number of unique employee names in the EMPLOYEES table. Executing the previous SELECT statement would yield inaccurate results since the COUNT() function in the statement checks for only unique values in the FIRST_NAME column. As such, the DBMS treats the second WALT as a duplicate name and does not count it, even though the full name Walt Wellinski is not the same as Walt Frazier.
To accurately count the number of unique employee names, then, you need to combine the contents of the FIRST_NAME and LAST_NAME columns into a single string and then pass that string as the parameter to the COUNT() function. In so doing, the SELECT statement
SELECT COUNT(first_name) AS Total_First_Names,
COUNT(DISTINCT first_name) AS Unique_First_Names
COUNT(DISTINCT first_name + last_name) AS
Unique_Full_Names
FROM employees
will return the results table:
Total_First_Names Unique_First_Names Unique_FullNames
----------------- ------------------ ----------------
6 4 5
|
Note |
If any one of the concatenated columns has a NULL value, the result of the concatenation is NULL and the current row will not be counted by the COUNT() function. In the current example, the row in which the FIRST_NAME column is NULL and the LAST_NAME column is Gomez is not counted by the COUNT() function since the result of the expression (the concatenation of first and last names) passed as the parameter to the COUNT() function is NULL. Remember, the COUNT() function counts only those rows in which the parameter passed to the function is not NULL.
|
SQL does not have a special function to count the number of nonunique (or duplicate) values in a column. You can, however compute the count of duplicate column values by subtracting the count of unique column values from the total count of column values, as shown in the SELECT statement:
SELECT COUNT(first_name + last_name) -
COUNT(DISTINCT first_name + last_name)
AS Dup_Name_Count
FROM employees
If you want to include the count of NULL values in the duplicate column data count, use the COUNT(*) function in place of the first COUNT() function in the SELECT statement. For example, the SELECT statement
SELECT COUNT(*) - COUNT(DISTINCT first_name + last_name)
AS Dup_Name_And_NULL_Count
FROM employees
will display the count of duplicate and NULL data values in the composite FULL_NAME (FIRST_NAME + LAST_NAME) column.
Using MS SQL Server CUBE and ROLLUP Operators to Summarize Table Data
The SQL GROUP BY clause lets you summarize table data based on one or more columns. MS-SQL Server provides two operators, CUBE and ROLLUP, which enhance the GROUP BY clause's totaling capabilities. By adding a WITH ROLLUP clause (or a WITH CUB clause) to a SELECT statement, you can tell MS-SQL Server to generate additional subtotals and grand totals for columns listed in the statement's GROUP BY clause.
Suppose, for example, that you have the following data in an INVOICES table:
inv_date inv_no cust_id product_code qty
---------- ------ ------- ------------ ---
2000-01-01 1 1 1 1
2900-01-01 1 1 6 1
2000-01-01 1 1 3 1
2000-01-01 1 1 5 6
2000-03-01 2 9 1 5
2000-03-01 2 9 2 4
2000-02-01 3 7 2 4
2000-05-01 4 7 5 1
2000-05-01 4 7 4 3
2000-05-01 4 7 2 8
2000-01-01 5 4 5 3
2000-01-01 5 4 6 3
2000-06-01 6 1 5 4
2000-06-01 7 5 5 4
To generate a summary report of products purchased by customers 4 and 5, use a SELECT statement with a GROUP BY clause similar to
SELECT cust_id, product_code, SUM(qty) AS quantity
FROM invoices WHERE cust_id IN (4, 5)
GROUP BY cust_id, product_code
ORDER BY cust_id
which, will produce the results table
cust_id product_code quantity
------- ------------ --------
4 5 3
4 6 3
5 5 4
for the current example's data. Thus, the GROUP BY clause tells the DBMS to "group" (or summarize) the nongrouped items in the SELECT list into the "categories" (or columns) listed in the GROUP BY clause.
In the current example, the sum of the QTY column is the only nongrouped item. As such, the GROUP BY clause tells the DBMS to compute the sum of the QTY column's values for each unique combination of CUST_ID and PRODUCT_CODE in the INVOICES table. The WHERE clause tells the DBMS to display only the data for customers 4 and 5.
If you are using MS-SQL Server, you can tell the DBMS to display the total products each customer purchased (without regard to product code) and the total number of all products purchased by all customers by adding a WITH ROLLUP clause. For the current example data, the SELECT statement (WITH ROLLUP)
SELECT cust_id, product_code, sum(qty) AS quantity
FROM invoices WHERE cust_id IN (4, 5)
GROUP BY cust_id, product_code
WITH ROLLUP
ORDER BY cust_id
will generate the results table:
cust_id product_code quantity
------- ------------ --------
NULL NULL 10
4 5 3
4 6 3
4 NULL 6
5 5 4
5 NULL 4
Each additional results table row with a NULL value is a subtotal generated by the ROLLUP operator. The NULL denotes the column being subtotaled-or, said another way, a NULL value in a column means "for all values" of that column. For example, the first row, which has a NULL in both the CUST_ID and PRODUCT_CODE columns, tells you that the QUANTITY column contains the total purchases made by all customers for all product codes. Similarly, the fourth row in the results table has a NULL only in the PRODUCT_CODE column. As such, the QUANTITY column in the fourth row shows the total of all PRODUCT_CODEs purchased by customer 4.
Notice that the results table for a SELECT statement WITH ROLLUP has only one row with a NULL value in the CUST_ID column. The ROLLUP operator "rolls up" the subtotals for the first column in the GROUP BY clause based on the remaining columns listed in the clause into a single grand total, which represents the quantity in the nongrouped column for "all" values of all columns in the GROUP BY list.
If you want to display subtotals for the first column in the GROUP BY clause for each unique combination of column values listed in the remainder of the clause, use the WITH CUBE clause in place of the WITH ROLLUP clause. For the current example, the SELECT statement
SELECT cust_id, product_code, sum(qty) AS quantity
FROM invoices WHERE cust_id IN (4, 5)
GROUP BY cust_id, product_code
WITH CUBE
ORDER BY cust_id
will generate the results table
cust_id product_code quantity
------- ------------ --------
NULL NULL 10
NULL 5 7
NULL 6 3
4 5 3
4 6 3
4 NULL 6
5 5 4
5 NULL 4
which includes two additional rows (rows 2 and 3) that show the total of all product 5 and product 6 purchases made by all customers.
|
Note |
The examples in the current tip use SELECT statements with only two columns listed in the GROUP BY clause. This was done to reduce the size of the results table. The MS-SQL Server CUBE and ROLLUP operators will accept GROUP BY clause with up to 10 columns listed.
|
Using the MAX() Aggregate Function to Find the Maximum Value in a Column
When you need to know the maximum or largest value in a column, use the MAX() aggregate (or set) function. The column passed to the MAX() function can be a numeric, a character string, or a datetime data type. As such, if you want to display the datetime of the most recent invoice for customer 1, you could use the SELECT statement
SELECT MAX(inv_date) AS 'Date Last Inv for Cust 1'
FROM invoices WHERE cust_id = 1
which will produce a results table similar to:
Date Last Inv for Cust 1
------------------------
2000-06-01 00:00:00.000
Similarly, if you want to display the highest ITEM_COST in the PRODUCTS table, you would use the SELECT statement
SELECT MAX(item_cost) AS 'Max Item Cost' FROM products
to produce a RESULTS table similar to:
MAX Item Cost
-------------
1844.5100
The MAX() function will return a single value of the same data type as the column it is passed. Thus, in the first example, the MAX() function scanned the values in the INV_DATE column (of type datetime) and returned its largest datetime value found in the column. Similarly, the MAX() function in the second example scanned a numeric column and returned the highest MONEY data type value in the ITEM_COST column.
|
Note |
When determining the maximum value in a column, the MAX() function ignores NULL values. However, if all rows in the column have a NULL value, the MAX() function will return a NULL value for that column.
|
Using the MIN() Aggregate Function to Find the Minimum Value in a Column
When you need to know the minimum or smallest value in a column, use the MIN() aggregate (or set) function. The column passed to the MIN() function can be a numeric, a character string, or a datetime data type. As such, if you want to display the datetime of the oldest invoice for customer 1, you could use the SELECT statement
SELECT MIN(inv_date) AS 'Date First Inv for Cust 1'
FROM invoices WHERE cust_id = 1
which will produce a results table similar to:
Date First Inv for Cust 1
-------------------------
2000-01-01 00:00:00.000
Similarly, if you want to display the least expensive ITEM_COST in the PRODUCTS table, you would use the SELECT statement
SELECT MIN(item_cost) AS 'Min Item Cost' FROM products
to produce a RESULTS table similar to:
Min Item Cost
-------------
258.2300
|
Note |
If you are working with a numeric column that contains negative values, the value returned by the MIN() function may not be the number closest to 0. For example, if a numeric column has the values 15, 5, 100, 0, -456, -10, and 200, the MIN() function will return the value -456. Remember, the larger the negative number, the smaller its value.
|
The MIN() function will return a single value of the same data type as the column it is passed. Thus, in the first example, the MIN() function scanned the values in the INV_DATE column (of type datetime) and returned its smallest datetime value. Similarly, the MIN() function in the second example scanned a numeric column and returned the lowest MONEY data type value in the ITEM_COST column.
|
Note |
When determining the minimum value in a column, the MIN() function ignores NULL values. However, if all rows in the column have a NULL value, the MIN() function will return a NULL value for that column.
|
Using the SUM() Aggregate Function to Find the Sum of the Values in a Column
The SUM() aggregate (or set) function returns the sum of the data values in a column. Because SUM() adds the values in the rows of a column together, the column passed to the SUM() function must be one of the numeric data types you learned about in Tip 21, "Understanding SQL Numeric Integer Data Types," and Tip 22, "Understanding SQL Numeric Floating-Point Data Types."
For example, to display the total sales commissions for all employees in the EMPLOYEES table, you could use a SELECT statement similar to
SELECT SUM(sales_commission) AS 'Total Commissions'
FROM employees
to produce a results table similar to:
Total Commissions
-----------------
1072105.7900
While the MIN() and MAX() functions each return a value that has exactly the same data type as the column passed to the function, the SUM() function returns a numeric data type that is either the same or a higher precision than the column passed. For example, to prevent an "overflow" error, the SUM() function may return a value of data type INTEGER when called upon to add up the values in a SMALLINT column-if the resulting sum is greater than 32,767.
|
Note |
The SUM() function ignores NULL values when adding up the numbers in a column. However, if all of the values in a column are NULL, the SUM() function will return NULL as its result for that column.
|
The syntax of the SUM() function is:
SUM([DISTINCT] )
As such, you can have the SUM() function give you the sum of all unique (distinct) values in a column. For example, the SUM() function in the SELECT statement
SELECT SUM(DISTINCT quantity_on_hand) FROM products
will return 27 if the values in the QUANTITY_ON_HAND column are 1, 5, 9, 12, 9, and 5. Conversely, the SUM function in
SELECT SUM(quantity_on_hand) FROM products
will produce a sum of 41 for the same data values in rows of the column.
Using the AVG() Aggregate Function to Find the Average of the Values in a Column
The AVG() aggregate (or set) function returns the average of the data values in a column. Because the AVG() function adds up the data values in a column and divides the sum by the number of non-NULL values, the column being averaged must be one of the numeric data types you learned about in Tip 21, "Understanding SQL Numeric Integer Data Types," and Tip 22, "Understanding SQL Numeric Floating-Point Data Types."
To display the average cost of the items (ITEM_COST) in a PRODUCTS table, you would a SELECT statement similar to:
SELECT AVG(item_cost) FROM products
Like the SUM() function, the AVG() function may not return the exact data type of the column that it is passed. For example, if you use the AVG() function to determine the average age of the employees in an EMPLOYEE table, such as
SELECT AVG(age) FROM employees
the AVG() function may return a floating-point numeric value, even though the AGE column is of one of the integer data types (either INTEGER or SMALLINT).
|
Note |
Check the system documentation to determine the type of conversions that may occur on your specific DBMS. MS-SQL Server, for example, will round the result of an AVG() function executed on an INTEGER column so that it returns an INTEGER result, while other DBMS implementations will return the computed average as an unrounded floating-point number.
|
When computing averages, the AVG() function ignores NULL values. As such,
SELECT SUM(sales_price) / COUNT(*) FROM products
is not equivalent to
SELECT AVG(sales_price) FROM products
if the SALES_PRICE for any product in the PRODUCT table has not yet been set and is set to NULL as a result. Although the SUM() function ignores NULL values, the COUNT(*) function does not.
Like the SUM() function the syntax of the AVG() function
AVG([DISTINCT] )
gives you the ability to average only unique (or distinct) values in a column. As such, if the AGE column in the EMPLOYEES table contains the values 26, 55, 34, 37, 34, and 55, the SELECT statement (for the average of unique values)
SELECT AVG(DISTINCT age) FROM employees
will display 38 as the average age, while the SELECT statement (for the average of all values)
SELECT AVG(age) FROM employees
would report an average age of 40.
|
Note |
As mentioned previously, the AVG() function ignores NULL values when computing a column's average value. However, if all rows in the column have a NULL value, the AVG() function will return a value of NULL for the column.
|
Using the WHERE Clause with the AVG() Function to Determine an Average Value for Select Rows in a Table
As you learned in Tip 120, "Using the AVG() Aggregate Function to Find the Average of the Values in a Column," the AVG() aggregate (or column) function returns the average of the data values in a table column. If you do not want to average all of the values in a column, you can use the search condition in a WHERE clause to limit the rows (and, therefore, the data values) the DBMS will include in the computation of the aggregate function.
For example, to display the average gross sales for all salespeople in the company, you would use a SELECT statement similar to:
SELECT AVG(gross_sales) AS 'Avg Gross Sales' FROM salesreps
To compute the average gross sales for only those assigned to Nevada (vs. all salespeople companywide), you would limit the GROSS_SALES data values passed to the AVG() function to only Nevada salespeople by adding a WHERE clause such as the one in the SELECT statement:
SELECT AVG(gross_sales) AS 'Avg NV Gross Sales'
FROM salesreps
WHERE sales_territory = 'NV'
When executing a SELECT statement, the DBMS evaluates the search conditions in the WHERE clause for each row in the table. Only data values from those rows for which the search condition evaluates to TRUE are passed to the aggregate function. (By default, every row in the table satisfies the "search condition" if the SELECT statement has no WHERE clause.)
In the current example, only the GROSS_SALES from the rows in which the SALES_TERRITORY is equal to NV (the abbreviation for Nevada) are included in the average calculated by the AVG() function and displayed in the results table under the Avg NV Gross Sales heading.
In addition to displaying the average value in a column for some or all of the rows in a table, you can use the AVG() function as part of a search condition in the SELECT statement's WHERE clause. Suppose, for example, that you want to display the average QUOTA and GROSS_SALES for salespeople whose GROSS_SALES are above the average for all employees overall. The SELECT statement
SELECT AVG(quota) AS 'Avg Quota for Above Avg Sales',
AVG(gross_sales) AS 'Avg Above Avg Sales'
FROM salesreps
WHERE gross_sales > (SELECT AVG(gross_sales)
FROM salesreps)
will determine the average of the GROSS_SALES for all salespeople and then compare each salesperson's GROSS_SALES against that average. Only QUOTA and GROSS_SALES values in those rows in which a salesperson's GROSS_SALES amount is above the overall average GROSS_SALES amount will be passed to the AVG(quota) and AVG(gross_sales) functions.
Understanding How Aggregate Functions in a SELECT Statement Produce a Single Row of Results
In Tips 113–120, you learned about the SQL aggregate functions COUNT(), MAX(), MIN(), SUM(), and AVG(). Although each performs a different function, all have the characteristic of aggregating (summarizing) a column's value from multiple table rows into a single value. The value returned by an aggregate function can then be used in a search condition or expression, or displayed as a column in a SELECT statement.
When executing a SELECT statement to display the values computed by one or more aggregate functions, the DBMS performs the following steps:
- Generates a virtual interim table that represents the product (or CROSS JOIN) of the tables in the SELECT statement's FROM clause.
- If there is a WHERE clause, evaluates its search condition(s) for each row in the interim table. Eliminates those rows for which the WHERE clause evaluates as FALSE or NULL (unknown)—that is, keeps only those rows for which the search condition evaluates to TRUE.
- Uses the values in the updated interim table to calculate the value of the aggregate function(s) in the SELECT statement's select clause.
- Displays the value computed by each aggregate function as a column value in a single-row results table.
Perhaps the best way to conceptualize the way in which the DBMS produces the results table for a summary query is think of the query's execution as having two distinct phases. In the first phase, the DBMS performs the steps in the detail query processing (which you learned about in Tip 87, "Understanding the Steps Involved in Processing an SQL SELECT Statement"). The resulting interim multi-row, multi-column input table has all of the columns from all of the tables in the SELECT statement's FROM clause, and all of the rows that satisfy the search criteria in the WHERE clause. In the second phase, the DBMS uses the aggregate functions in the query's select clause to summarize the multi-row interim input table into single values that it can display as columns in a single-row results table.
If you think of an aggregate function in a select clause as directing the DBMS to summarize data to produce a single line of results, you can understand why the SELECT statement
SELECT dept, COUNT(*) FROM employees
is illegal. After all, the column reference DEPT tells the DBMS to provide a multi-row listing—one virtual table row showing the department number for every employee in the EMPLOYEES table. Unfortunately, this directly contradicts the direction of the second item in the select list. The aggregate function, COUNT(*) tells the DBMS to provide a single-row results table (with a column that displays the number of rows in the input table).
Thus, a SELECT statement with column and aggregate functions listed in the select clause is illegal when the columns listed tell the DBMS to perform a detailed query at the same time that the aggregate functions tell the DBMS to perform a summary query.
|
Note |
Mixing a column list and aggregate functions in a select clause is legal if all columns listed in the select clause also appear in the SELECT statement's GROUP BY clause. Thus, the SELECT statement
SELECT dept, COUNT(*) FROM employees GROUP BY dept
is legal. (You will learn how single- and multiple-column GROUP BY clauses convert detailed queries into a summary queries in Tip 270, "Using a GROUP BY Clause to Group Rows Based on a Single-Column Value," and Tip 271, "Using a GROUP BY Clause to Group Rows Based on Multiple Columns.")
|
Understanding the Impact of Columns with NULL Values on Set Functions
The SQL-92 standard specifies that aggregate functions (COUNT(), MAX(), MIN(), SUM(), and AVG()) ignore NULL values. Thus, given an EMPLOYEES table with the values
emp_id appt_quota appt_count sales_count sales_amount
------ ---------- ---------- ----------- ------------
1 2 4 2 10725.0000
2 5 6 5 25625.0000
3 NULL 2 0 0.0000
4 NULL 5 0 0.0000
5 7 9 7 35259.0000
6 4 4 4 20748.0000
7 9 10 9 45589.0000
the SELECT statement
SELECT COUNT(*) AS 'Rows', COUNT(emp_id) AS 'Emps',
COUNT(appt_quota) AS 'Quotas',
SUM(appt_quota) AS 'Sum Quotas',
SUM(sales_count) AS 'Sum Sales',
AVG(appt_quota) AS 'Avg Quota',
AVG(sales_count) AS 'Avg Sales'
FROM employees
produces the results table:
Rows Emps Quotas Sum Quotas Sum Sales Avg Quota Avg Sales
---- ---- ------ ---------- --- ----- --- ----- ---------
7 7 5 27 27 5 3
In Tip 113, "Using the COUNT(*) Aggregate Function to Count the Number of Rows in a Table," you learned that the row count function COUNT(*) is not affected by NULL values in any of the table's columns, while the COUNT() function counts only non-NULL values in a column. As such, the results table shows that the COUNT(emp_id) function returns the same value (7) as the COUNT(*) function since every row in the EMP_ID column has a non-NULL value. Conversely, the COUNT(appt_quota) function returns a value of 5 because two of the rows in the APPT_QUOTA column have NULL values. (Or, stated another way, only five of the rows in the APPT_QUOTA column have non-NULL values, and the COUNT() function includes only rows with non-NULL values in its count of values in a column).
The SUM() function ignores NULL values, too, hence the non-NULL result of computing the sum of column APPT_QUOTA, which has NULL values in two of its rows.
|
Note |
Be careful when using column values. With the exception of the SUM() function (which ignores NULL values), if a column used as an operand in an expression has a NULL value, the DBMS will evaluate the entire expression as NULL.
|
While reviewing the value of the last two columns of the results table, you may have noticed what appears to be an error. Although the sum of the values in the APPT_QUOTA and SALES_COUNT columns comes to 27, the AVG() function returns a different average (5 vs. 3) for the two columns, even though the total is the same and the number of employees is the same for both columns.
The reason for the disparity is that the AVG() function, like the other aggregate functions, ignores NULL values. Thus, while the sum of the values in both columns is the same, the AVG(appt_quota) function has two fewer rows (5) to divide into the sum of the APPT_QUOTAs than the AVG(sales_count) function has to divide into the sum of the SALES_COUNTs. Thus, the AVG(appt_quota) is computed as 27 / 5, while the AVG(sales_count) is computed as 27 / 7 because rows with NULL values in the APPT_QUOTA column are omitted from the computation of the AVG(appt_quota).
Using the example data one last time, you would expect that the expressions
SUM (appt_count) - SUM(appt_quota)
and
SUM (appt_count - appt_quota)
to produce the same result—the number of appointments either above or below quota set by the employees.
Yet, the SELECT statement
SELECT
SUM(appt_count) AS 'Sum Appts',
SUM(appt_quota) AS 'Sum Quotas',
SUM(appt_count - appt_quota) AS 'Sum of Difference'
SUM(appt_count) - SUM(appt_quota) AS 'Difference of Sums'
FROM employees
produces the results table:
SUM Appts Sum Quotas Sum of Difference Difference of Sums
--------- ---------- ----------------- ------------------
40 27 6 13
Comparing the sum of the appointments (40) to the sum of the appointment quotas (27), it would appear that the expression
SUM (appt_count - appt_quota)
yields the correct results, while
SUM (appt_count) - SUM(appt_quota)
does not. However, both results are correct—the difference is a matter of semantics.
The "sum of differences" takes the difference between the two columns in each row of the table. Unfortunately, two of the rows have a NULL value for one of the operands in the subtraction. As a result, two of the subtractions evaluate to NULL and are ignored by the SUM() function. Thus, the number in the Sum of the Differences column is best defined as "the number or appointments by which employees with quotas exceeded the quota for appointments."
Conversely, the "difference of sums" calculated the sum of each column first, thus increasing the total appointment count by 7 due to the inclusion of the APPT_COUNT values for employees 3 and 4. As a result, the number under the Difference of Sums heading can best be described as "the number of appointments by which all employees exceeded the quota for appointments."
(The APPT_COUNT for employees 3 and 4 was eliminated from the SUM(appt_count -appt_quota) expression because APPT_COUNT-APPT_QUOTA evaluates to NULL for both since the APPT_QUOTA operand in the subtraction is NULL in the two rows.)
Using the AND Logical Connective to Do Multiple Condition Selects of Rows in a Table
A WHERE clause in a SELECT statement lets you specify the search condition the DBMS is to use in selecting the data to display in its results table. While the WHERE clause is optional, almost every SELECT statement has one. After all, without a WHERE clause, the results table will include data from every row in the input table. Since database tables often grow to several million rows, displaying an entire table's contents is both impractical and of little use—especially since the answer to most nonacademic queries lies in the summary of a small portion of a table's data.
Suppose, for example, that you want a list of all employees currently working in the marketing department. You might use a query similar to:
SELECT emp_id, first_name, last_name FROM employees
WHERE dept = 'Marketing'
Without the WHERE clause, the SELECT statement will present a results table listing all employees in the company. If you are trying only to assign marketing personnel to floor supervisors, the "extra" nonmarketing employee names would just be in the way.
Even with its WHERE clause, the example SELECT statement does not narrow down the employee list enough. After all, you want to manage a list of only active marketing reps, while the employee table includes all marketing reps—including those no longer employed by your company. (For tax reporting purposes, most employee tables will include information for terminated employees, at least until the end of the calendar year.)
Since a single search condition in a WHERE clause can test only one data value, testing for multiple values (such as DEPT and EMPLOYMENT_STATUS) will require a multiple search condition (or compound) predicate. That's where the AND logical connective comes in.
The AND logical connective lets you specify an additional search condition that must also be true in order for the SELECT statement to display data values from the row the DBMS IS testing. For the current example, you would use the AND logical connective to add a second search condition to the WHERE clause, as follows:
SELECT emp_id, first_name, last_name FROM employees
WHERE dept = 'Marketing'
AND employment_status = 'Active'
Now the DBMS will return the EMP_ID, FIRST_NAME, and LAST_NAME for only those rows in which the value in the DEPT column is Marketing and the value in the EMPLOYMENT_STATUS column is Active. As such, you will get a list of all marketing personnel currently working for your company.
You can use as many AND connectives in a WHERE clause as you need to specify all of the search conditions the input table's row must satisfy in order to have its select list values included in the results table. Each additional AND connective lets you add one more search condition. Thus, for the current example, if you wanted a list of only marketing personnel who were currently employed and not yet assigned to a supervisor, you would add a second AND connective to include a third search condition:
SELECT emp_id, first_name, last_name FROM employees
WHERE dept = 'Marketing'
AND employment_status = 'Active'
AND supervisor IS NULL
The important thing to know is that no matter how many ANDs you use to join search conditions in a WHERE clause, all of the search conditions connected by an AND must evaluate to TRUE in order for values from the current row in the input table to be included in the results table.
Using the NOT Logical Connective to Negate the Search Condition in a WHERE Clause
As you learned in Tip 124, "Using the AND Logical Connective to Do Multiple Condition Selects of Rows in a Table," the WHERE clause must evaluate to TRUE in order for the DBMS to perform the action prescribed by the SQL statement. A SELECT statement, for example, will display select list values for only those rows that satisfy the search criteria in the WHERE clause. Similarly, the UPDATE statement will change column values only in rows for which the search criteria evaluate to TRUE; and the DELETE statement will remove only rows whose data values satisfy the conditions set forth in the WHERE clause.
The NOT logical connective gives you a way to tell the DBMS to take action on those rows in which the search condition in the WHERE clause evaluates to FALSE instead of those in which it evaluates to TRUE.
Suppose, for example, that you want a list of all cars in the AUTO_INVENTORY table that were not red, green, or blue. If you do not know all of the possible colors for cars in your inventory, you cannot construct a WHERE clause for the select statement using:
SELECT vin_number, make, model, year, price, color
FROM AUTO_INVENTORY
WHERE color IN ('Orange', 'Brown', 'Chartreuse')
After all, if you fail to list the color of one of your (nonred, nongreen, or nonblue) cars in the set of all possible colors, the DBMS will omit the cars of that color from your inventory list. Moreover, since we already made the assumption that you do not know all of the possible automobile colors, your inventory list will most likely be incomplete and, therefore, inaccurate.
The NOT logical operator lets you avoid having to list all of the possible search conditions that could be TRUE by letting you tell the DBMS what must NOT be TRUE in order for the DBMS to include a row's data in the results table. For the current example, the SELECT statement
SELECT vin_number, make, model, year, price, color
FROM AUTO_INVENTORY
WHERE color NOT IN ('Red', 'Green', 'Blue')
will return a complete and accurate list of the cars in your AUTO_INVENTORY table that are not red, green, or blue in color.
When evaluating a search condition proceeded by the NOT logical connective, the DBMS changes the Boolean value of the search condition as follows:
NOT TRUE FALSE NULL
--------------------
FALSE TRUE NULL
Thus, the NOT connective will change TRUE to FALSE, change FALSE to TRUE, and leave a NULL result unchanged.
Therefore, the SELECT statement
SELECT first_name, last_name FROM employees
WHERE NOT age > 50
will display the names of employees who are 50 years old or younger because the SELECT statement will include input from only table rows for which the WHERE clause evaluates TRUE. As such, when the inner search condition in the WHERE clause evaluates to TRUE for employees older than 50, the NOT logical connective changes the evaluation to FALSE. Thus, the overall WHERE clause evaluates to FALSE whenever AGE > 50 evaluates to TRUE—which means any row in which the AGE column is greater than 50 will be omitted from the results table.
Admittedly, the search condition in the previous example (which produced a list of employees under the age of 50) could be written more clearly as:
SELECT first_name, last_name FROM employees
WHERE age <= 50
However, the expression WHERE NOT age > 50 does illustrate how the NOT logical connective negates the Boolean value returned by the comparison operator and causes the DBMS to include the rows normally excluded while excluding those normally included.
The important thing to know now is that you can use the NOT logical connective in those instances in which telling the DBMS what rows not to include in the results table is more convenient or easier to understand than trying to test for all of the conditions under which a row is to be included.
Using the OR Logical Connective to Do Multiple Condition Selects of Rows in a Table
The WHERE clause in a SELECT statement tells the DBMS which rows from the input table to include in the query's results table. While executing a SELECT statement, the DBMS scans the rows in the input table, substituting column values from the current row into corresponding column name references in the search condition. If the search condition evaluates to TRUE, the DBMS adds values from the current row to the results table.
Each search condition in a WHERE clause can perform a conditional test on a single column. Thus, to query the STUDENTS table for a list of all chemistry majors, you might use a SELECT statement similar to:
SELECT student_id, first_name, last_name FROM students
WHERE major = 'Chemistry'
If the decision of whether or not to include a row's data in the results table involves checking more than one column or checking the same column multiple ways, you will need to add an additional search condition to the WHERE clause for each test the DBMS is to perform. In Tip 124, "Using the AND Logical Connective to Do Multiple Condition Selects of Rows in a Table," you learned how to use the logical connective AND to join multiple search conditions in a WHERE clause.
As you now know, the AND connective tells the DBMS that all search conditions it joins must be TRUE in order for the WHERE clause to evaluate TRUE. If you want the DBMS to include data from the current row if one or more of the search conditions in a WHERE clause is TRUE, use an OR (instead of an AND) to combine them.
Suppose, for example, that you want a list of students majoring in chemistry, biology, or mathematics. You could produce such a list by submitting a SELECT statement similar to:
SELECT student_id, first_name, last_name, major
FROM students
WHERE major = 'Chemistry' OR major = 'Biology'
OR major = 'Mathematics'
After the DBMS executes the SELECT statement, the results table will include student ID, name, and major information from rows in which the value in the MAJOR column is chemistry, biology, or mathematics—that is, if any one of the search conditions joined by an OR is TRUE, the WHERE clause evaluates to TRUE, and the DBMS will include data from row being checked in the results table.
Understanding SQL Transactions and Transaction Logs
|