Retrieving Data from the Sample Database
At this point, you should have a sample database (movies) that contains three tables (tapes, customers, and rentals) and a few rows in each table. You know how to get data into a table; now let's see how to view that data.
The SELECT statement is used to retrieve data from a database. SELECT is the most complex statement in the SQL language, and the most powerful. Using SELECT, you can retrieve entire tables, single rows, a group of rows that meet a set of constraints, combinations of multiple tables, expressions, and more. To help you understand the basics of the SELECT statement, I'll try to break it down into each of its forms and move from the simple to the more complex.
SELECT Expression
In its simplest form, you can use the SELECT statement to retrieve one or more values from a set of predefined functions. You've already seen how to retrieve your PostgreSQL user id:
movies=# select user; current_user --------------- korry (1 row) movies=# q
Other values that you might want to see are
select 5; -- returns the number 5 (whoopee) select sqrt(2.0); -- returns the square root of 2 select timeofday();-- returns current date/time select now(); -- returns time of start of transaction select version(); -- returns the version of PostgreSQL you are using select now(), timeofday();
Commenting
The -- characters introduce a commentany text that follows is ignored. |
The previous example shows how to SELECT more than one piece of informationjust list all the values that you want, separated by commas.
The PostgreSQL User's Guide contains a list of all the functions that are distributed with PostgreSQL. In Chapter 2, I'll show you how to combine columns, functions, operators, and literal values into more complex expressions.
SELECT * FROM Table
You probably won't use the first form of the SELECT statement very oftenit just isn't very exciting. Moving to the next level of complexity, let's see how to retrieve data from one of the tables that you created earlier:
movies=# SELECT * FROM customers; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00(4 rows)
When you write a SELECT statement, you have to tell PostgreSQL what information you are trying to retrieve. Let's take a closer look at the components of this SELECT statement.
Following the SELECT keyword, you specify a list of the columns that you want to retrieve. I used an asterisk (*) here to tell PostgreSQL that we want to see all the columns in the customers table.
Next, you have to tell PostgreSQL which table you want to view; in this case, you want to see the customers table.
Now let's look at the results of this query. A SELECT statement returns a result set. A result set is a table composed of all the rows and columns (or fields) that you request. A result set may be empty.
You asked PostgreSQL to return all the columns in the customers tablenotice that the columns are displayed (from left to right) in the order that you specified when you created the table. You may have noticed that the rows are returned in an (apparently) arbitrary order. That's an important thing to keep in mind: Unless you specifically request that PostgreSQL return rows in a particular order, you won't be able to predict which rows will come first[5]. This is a performance feature; if you don't care about row ordering, let PostgreSQL return the rows in the fastest possible way.
[5] Okay, some people probably could predict the order in which the rows will appear. Those people have way too much free time and consider a propeller to be fashionable headwear. They are also very good at inducing sleep.
SELECT Single-Column FROM Table
If you don't want to view all of the columns from a table, you can replace the * (following the SELECT keyword) with the name of a column:
movies=# SELECT title FROM tapes; title --------------- The Godfather The Godfather Casablanca Citizen Kane Rear Window (5 rows)
Again, the rows are presented in an arbitrary order. But this time you see only a single column. You may have noticed that "The Godfather" appears twice in this list. That happens because our imaginary video store owns two copies of that movie. I'll show you how to get rid of duplicates in a moment.
SELECT Column-List FROM Table
So far, you have seen how to select all the columns in a table and how to select a single column. Of course, there is a middle groundyou can select a list of columns:
movies=# SELECT customer_name, birth_date FROM customers; customer_name | birth_date ----------------------+------------ Jones, Henry | 1970-10-10 Rubin, William | 1972-07-10 Panky, Henry | 1968-01-21 Wonderland, Alice N. | 1969-03-05 (4 rows)
Instead of naming a single column after the SELECT keyword, you can provide a column-separated list of column names. Column names can appear in any order, and the results will appear in the order you specify.
SELECT Expression-List FROM Table
In addition to selecting columns, you can also select expressions. Remember, an expression is a combination of columns, functions, operators, literal values, and other expressions that will evaluate to a single value. Here is an example:
movies=# SELECT movies-# customer_name, movies-# birth_date, movies-# age( birth_date ) movies-# FROM customers; customer_name | birth_date | age ----------------------+------------+------------------------------ Jones, Henry | 1970-10-10 | 31 years 4 mons 3 days 01:00 Rubin, William | 1972-07-10 | 29 years 7 mons 3 days 01:00 Panky, Henry | 1968-01-21 | 34 years 23 days Wonderland, Alice N. | 1969-03-05 | 32 years 11 mons 8 days (4 rows)
In this example, I've selected two columns and an expression. The expression age( birth_date ) is evaluated for each row in the table. The age() function subtracts the given date from the current date[6].
[6] Technically, the age() function subtracts the given timestamp (date+time) from the current date and time.
Selecting Specific Rows
The preceding few sections have shown you how to specify which columns you want to see in a result set. Now let's see how to choose only the rows that you want.
First, I'll show you to how to eliminate duplicate rows; then I'll introduce the WHERE clause.
SELECT [ALL | DISTINCT | DISTINCT ON]
In an earlier example, you selected the titles of all the videotapes owned by your video store:
movies=# SELECT title from tapes; title --------------- The Godfather The Godfather Casablanca Citizen Kane Rear Window (5 rows)
Notice that "The Godfather" is listed twice (you own two copies of that video). You can use the DISTINCT clause to filter out duplicate rows:
movies=# SELECT DISTINCT title FROM tapes; title --------------- Casablanca Citizen Kane Rear Window The Godfather (4 rows)
You now have a single row with the value "The Godfather." Let's see what happens when you add the tape_id back into the previous query:
movies=# SELECT DISTINCT title, tape_id FROM tapes; title | tape_id ---------------+---------- Casablanca | MC-68873 Citizen Kane | OW-41221 Rear Window | AH-54706 The Godfather | AB-12345 The Godfather | AB-67472 (5 rows)
We're back to seeing "The Godfather" twice. What happened? The DISTINCT clause removes duplicate rows, not duplicate column values; and when the tape IDs are added to the result, the rows containing "The Godfather" are no longer identical.
If you want to filter rows that have duplicate values in one (or more) columns, use the DISTINCT ON() form:
movies=# SELECT DISTINCT ON (title) title, tape_id FROM tapes; title | tape_id ---------------+---------- Casablanca | MC-68873 Citizen Kane | OW-41221 Rear Window | AH-54706 The Godfather | AB-12345 (4 rows)
Notice that one of the "The Godfather" rows has been omitted from the result set. If you don't include an ORDER BY clause (I'll cover that in a moment), you can't predict which row in a set of duplicates will be included in the result set.
You can list multiple columns (or expressions) in the DISTINCT ON() clause.
The WHERE Clause
The next form of the SELECT statement includes the WHERE clause. Here is the syntax diagram for this form:
SELECT expression-list FROM table WHERE conditions
Using the WHERE clause, you can filter out rows that you don't want included in the result set. Let's see a simple example. First, here is the complete customers table:
movies=# SELECT * FROM customers; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 (4 rows)
Now pick out only those customers who owe you some money:
movies=# SELECT * FROM customers WHERE balance > 0; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 (2 rows)
In this example, I've used a single condition to restrict the rows included in the result set: balance > 0.
When PostgreSQL executes a SELECT statement, it evaluates the WHERE clause as it processes each row. If all the conditions specified by the WHERE clause are met, the row will be included in the result set (if a row meets all the conditions in the WHERE clause, the row satisfies the WHERE clause).
Here is an example that is slightly more complex:
movies=# SELECT customer_name, phone FROM customers movies-# WHERE movies-# ( balance = 0 ) movies-# AND movies-# ( AGE( birth_date ) < '35 years' ) movies-# ; customer_name | phone ---------------+---------- Jones, Henry | 555-1212 (1 row)
In this query, I've specified two conditions, separated by an AND operator. The conditions are: balance = 0 and AGE( birth_date ) < '34 years'[7]. As before, PostgreSQL reads each row in the customers table and evaluates the WHERE clause. If a given row is to be included in the result set, it must satisfy two constraintsbalance must be equal to zero and the customer must be younger than 35 years of age. If either of these conditions is false for a given row, that row will not be included in the result set.
[7] I'll show you how to format various date/time related values in Chapter 2.
AND is one of the logical operators supported by PostgreSQL. A logical operator is used to combine logical expressions. A logical expression is an expression that evaluates to trUE, FALSE, or unknown (NULL). The other two logical operators are OR and NOT.
Let's see how the OR operator works:
movies=# SELECT customer_id, customer_name, balance, AGE(birth_date) movies-# FROM customers movies-# WHERE movies-# ( balance = 0 ) movies-# OR movies-# ( AGE( birth_date ) < '35 years' ) movies-# ; customer_id | customer_name | balance | age -------------+----------------+---------+---------------------------------- 3 | Panky, Henry | 0.00 | 36 years 8 mons 29 days 23:00:00 1 | Jones, Henry | 0.00 | 34 years 10 days 2 | Rubin, William | 15.00 | 32 years 3 mons 10 days (3 rows)
The OR operator evaluates to trUE if either (or both) of the conditions is TRUE. The first row (id = 1) is included in the result set because it satisfies the first condition (balance = 0). It is included even if it does not satisfy the second condition. The second row (id = 2) is included in the result set because it satisfies the second condition, but not the first. You can see the difference between AND and OR. A row satisfies the AND operator if both conditions are TRUE. A row satisfies the OR operator if either condition is trUE (or if both are TRUE).
The NOT operator is simple:
movies=# SELECT * FROM customers movies-# WHERE movies-# NOT ( balance = 0 ) movies-# ; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 (2 rows)
NOT evaluates to TRUE if its operand is FALSE and evaluates to FALSE if its operand is TRUE. The NOT operator inverts (or reverses) a test. Without the NOT operator, the previous example would have returned all customers where the balance column was equal to zero. With the NOT operator, you get the other rows instead.
One other point that I should mention about the WHERE clause. Just because you mention a column in the WHERE clause does not mean that you have to include the column in the result set. For example:
movies=# SELECT customer_id, customer_name FROM customers movies-# WHERE movies-# balance != 0 movies-# ; customer_id | customer_name -------------+---------------------- 4 | Wonderland, Alice N. 2 | Rubin, William (2 rows)
This example also shows a more common alternative to the NOT operator. The != operator means "is not equal to." The != operator is not an exact replacement for NOTit can only be used to check for inequality, whereas NOT is used to reverse the sense of any logical expression.
NULL Values
Sometimes when you add data to a table, you find that you don't know what value you should include for a column. For example, you may encounter a customer who does not want to provide you with his or her birthday. What value should be recorded in the birth_date column for that customer? You don't really want to make up an answeryou want a date value that means "unknown." This is what the NULL value is for. NULL usually means that you don't know what value should be entered into a column, but it can also mean that a column does not apply. A NULL value in the birth_date column certainly means that we don't know a customer's birth_date, not that birth_date does not apply[8]. On the other hand, you might want to include a rating column in the tapes table. A NULL value in the rating column might imply that the movie was produced before ratings were introduced and therefore the rating column does not apply.
[8] I am making the assumption that the customers for your video store have actually been born. For some of you, that may not be a valid assumption.
Some columns should not allow NULL values. In most cases, it would not make sense to add a customer to your customers table unless you know the customer's name. Therefore, the customer_name column should be mandatory (in other words, customer_name should not allow NULL values).
Let's drop and re-create the customers table so that you can tell PostgreSQL which columns should allow NULL values:
movies=# DROP TABLE customers; DROP movies=# CREATE TABLE customers ( movies-# customer_id INTEGER UNIQUE NOT NULL, movies-# customer_name VARCHAR(50) NOT NULL, movies-# phone CHAR(8), movies-# birth_date DATE, movies-# balance DECIMAL(7,2) movies-#); CREATE
The NOT NULL modifier tells PostgreSQL that the customer_id and customer_name columns are mandatory. If you don't specify NOT NULL, PostgreSQL assumes that a column is optional. You can include the keyword NULL to make your choices more obvious:
movies=# DROP TABLE customers; DROP movies=# CREATE TABLE customers ( movies-# customer_id INTEGER UNIQUE NOT NULL, movies-# customer_name VARCHAR(50) NOT NULL, movies-# phone CHAR(8) NULL, movies-# birth_date DATE NULL, movies-# balance DECIMAL(7,2) NULL movies-#); CREATE
Notice that a column of any data type can support NULL values.
The NULL value has a unique property that is often the source of much confusion. NULL is not equal to any value, not even itself. NULL is not less than any value, and NULL is not greater than any value. Let's add a customer with a NULL balance:
movies=# INSERT INTO customers movies-# VALUES movies-# ( movies(# 5, 'Funkmaster, Freddy', '555-FUNK', NULL, NULL movies(# ) movies-# ;
Now we have five customers:
movies=# SELECT * FROM customers; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 5 | Funkmaster, Freddy | 555-FUNK | | (5 rows)
One of these customers has a NULL balance. Let's try a few queries:
movies=# SELECT * FROM customers WHERE balance > NULL; customer_id | customer_name | phone | birth_date | balance -------------+---------------+-------+------------+--------- (0 rows)
This query did not return any rows. You might think that it should have customer number 2 (Rubin, William); after all, 15.00 is surely greater than 0. But remember, NULL is not equal to, greater than, or less than any other value. NULL is not the same as zero. Rather than using relational operators ( '=', '!=', '<', or '>'), you should use either the IS or IS NOT operator.
movies=# SELECT * FROM customers WHERE balance IS NULL; customer_id | customer_name | phone | birth_date | balance -------------+--------------------+----------+------------+--------- 5 | Funkmaster, Freddy | 555-FUNK | | (1 row) movies=# SELECT * FROM customers WHERE balance IS NOT NULL; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 (4 rows)
The NULL value introduces another complication. If NULL is not greater than, equal to, or less than any other value, what would 'NULL + 4' mean? Is NULL + 4 greater than NULL? It can't be because that would imply that NULL is less than NULL + 4 and, by definition, NULL can't be less than another value. What does all this mean? It means that you can't do math with a NULL value.
movies=# SELECT customer_id, customer_name, balance, balance+4 FROM customers; customer_id | customer_name | balance | ?column? -------------+----------------------+---------+---------- 3 | Panky, Henry | 0.00 | 4.00 1 | Jones, Henry | 0.00 | 4.00 4 | Wonderland, Alice N. | 3.00 | 7.00 2 | Rubin, William | 15.00 | 19.00 5 | Funkmaster, Freddy | | (5 rows)
This query shows what happens when you try to perform a mathematical operation using NULL. When you try to add '4' to NULL, you end up with NULL.
The NULL value complicates logic operators as well. Most programmers are familiar with two-valued logic operators (that is, logic operators that are defined for the values trUE and FALSE). When you add in NULL values, the logic operators become a bit more complex. Tables 1.4, 1.5, and 1.6 show the truth tables for each logical operator.
a |
b |
a AND b |
---|---|---|
trUE |
trUE |
trUE |
trUE |
FALSE |
FALSE |
trUE |
NULL |
NULL |
FALSE |
FALSE |
FALSE |
FALSE |
NULL |
FALSE |
NULL |
NULL |
NULL |
Source: PostgreSQL User's Guide |
a |
b |
a OR b |
---|---|---|
trUE |
trUE |
trUE |
trUE |
FALSE |
TRUE |
TRUE |
NULL |
trUE |
FALSE |
FALSE |
FALSE |
FALSE |
NULL |
NULL |
NULL |
NULL |
NULL |
Source: PostgreSQL User's Guide |
a |
NOT a |
---|---|
trUE |
FALSE |
FALSE |
trUE |
NULL |
NULL |
Source: PostgreSQL User's Guide |
I don't mean to scare you away from the NULL valueit's very useful and often necessarybut you do have to understand the complications that it introduces.
NULLIF() and COALESCE()
PostgreSQL offers two operators that can convert a NULL value to some other value or convert a specific value into NULL.
The COALESCE() operator will substitute a default value whenever it encounters a NULL. For example, pretend that you've added two more columns, male_lead and female_lead to the tapes table so that it looks like this:
movies=# SELECT * from tapes; tape_id | title | male_lead | female_lead | duration ----------+---------------+-----------------+----------------+---------- AB-12345 | The Godfather | Marlon Brando | | 02:55:00 AB-67472 | The Godfather | Marlon Brando | | 02:55:00 MC-68873 | Casablanca | Humphrey Bogart | Ingrid Bergman | 01:42:00 OW-41221 | Citizen Kane | | | 01:59:00 AH-54706 | Rear Window | James Stewart | Grace Kelly | AH-44289 | The Birds | | Tippi Hedren | 01:59:00 (6 rows)
You can use the COALESCE() operator to transform a NULL male_lead into the word 'Unknown':
movies=# SELECT title, COALESCE( male_lead, 'Unknown' ) FROM tapes; title | coalesce ---------------+----------------- The Godfather | Marlon Brando The Godfather | Marlon Brando Casablanca | Humphrey Bogart Citizen Kane | Unknown Rear Window | James Stewart The Birds | Unknown (6 rows)
The COALESCE() operator is more talented than we've shown hereit can search through a list of values, returning the first non-NULL value it finds. For example, the following query prints the male_lead, or, if male_lead is NULL, the female_lead, or if both are NULL, 'Unknown':
movies=# SELECT title, COALESCE( male_lead, female_lead, 'Unknown' ) movies-# AS "Starring" movies-# FROM TAPES; title | Starring ---------------+----------------- The Godfather | Marlon Brando The Godfather | Marlon Brando Casablanca | Humphrey Bogart Citizen Kane | Unknown Rear Window | James Stewart The Birds | Tippi Hedren (6 rows)
You can string together any number of expressions inside of the COALESCE() operator (as long as all expressions evaluate to the same type) and COALESCE() will evaluate to the leftmost non-NULL value in the list. If all of the expressions inside COALESCE() are NULL, the entire expression evaluates to NULL.
The NULLIF() operator translates a non-NULL value into NULL. NULLIF() is often used to do the opposite of COALESCE(). COALESCE() TRansforms NULL into a default valueNULLIF() translates a default value into NULL. In many circumstances, you want to treat a numeric value and a NULL value as being the same thing. For example, the balance column (in the customers table) is NULL until a customer actually rents a tape: A NULL balance implies that you haven't actually done any business with the customer yet. But a NULL balance also implies that the customer owes you no money. To convert a NULL balance to 0, use COALESCE( balance, 0 ). To convert a zero balance to NULL, use NULLIF( balance, 0 ). When PostgreSQL evaluates an NULLIF( arg1, arg2 ) expression, it compares the two arguments; if they are equal, the expression evaluates to NULL; if they are not equal, the expression evaluates to the value of arg1.