The CASE Expression
The CASE expression is a more generic form of NULLIF() and COALESCE(). A CASE expression lets you map any given value into some other value. You can write a CASE expression in two different forms. The first form (called the simple form) looks like this:
CASE expression1 WHEN value1 THEN result1 WHEN value2 THEN result2 ... [ ELSE resultn ] END
When PostgreSQL evaluates a simple CASE expression, it computes the value of expression1 then compares the result to value1. If expression1 equals value1, the CASE expression evaluates to result1. If not, PostgreSQL compares expression1 to value1; if they match, the CASE expression evaluates to result2. PostgreSQL continues searching through the WHEN clauses until it finds a match. If none of the values match expression1, the expression evaluates to the value specified in the ELSE clause. If PostgreSQL gets all the way to the end of the list and you haven't specified an ELSE clause, the CASE expression evaluates to NULL. Note that result1, result2, … resultn must all have the same data type.
You can see that NULLIF( balance, 0 ) is equivalent to
CASE balance WHEN 0 THEN NULL ELSE balance END
The second, more flexible form of the CASE expression is called the searched form:
CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 ... [ ELSE resultn ] END
When PostgreSQL evaluates a searched CASE expression, it first evaluates condition1. If condition1 evaluates to TRue, the value of the CASE expression is result1. If condition1 evaluates to false, PostgreSQL evaluates condition2. If that condition evaluates to true, the value of the CASE expression is result2. Otherwise, PostgreSQL moves on to the next condition. PostgreSQL continues to evaluate each condition until it finds one that evaluates to true. If none of the conditions is true, the CASE expression evaluates to resultn. If PostgreSQL gets all the way to the end of the list and you haven't specified an ELSE clause, the CASE expression evaluates to NULL.
Like the simple form, result1, result2, … resultn must all have the same data type. However, in the searched form, the conditions don't have to be similar to each other. For example, if you want to classify the titles in your tapes collection (and you're a big Jimmy Stewart fan), you might use a CASE expression like this:
movies=# SELECT movies-# title, male_lead, duration, movies-# CASE movies-# WHEN duration < '1 hour 45 min' THEN 'short movie' movies-# WHEN male_lead = 'James Stewart' THEN 'great movie' movies-# WHEN duration > '2 hours' THEN 'long movie' movies-# END movies-# FROM movies-# tapes; title | male_lead | duration | case ---------------+-----------------+----------+------------- The Godfather | Marlon Brando | 02:55:00 | long movie The Godfather | Marlon Brando | 02:55:00 | long movie Casablanca | Humphrey Bogart | 01:42:00 | short movie Citizen Kane | | 01:59:00 | Rear Window | James Stewart | | great movie The Birds | | 01:59:00 | (6 rows)
The ORDER BY Clause
So far, all the queries that you have seen return rows in an arbitrary order. You can add an ORDER BY clause to a SELECT command if you need to impose a predictable ordering. The general form of the ORDER BY clause is[9]
[9] PostgreSQL supports another form for the ORDER BY clause: ORDER BY expression [ USING operator ] [, ...]. This might seem a little confusing at first. When you specify ASC, PostgreSQL uses the < operator to determine row ordering. When you specify DESC, PostgreSQL uses the > operator. The second form of the ORDER BY clause allows you to specify an alternative operator.
ORDER BY expression [ ASC | DESC ] [, ...]
The ASC and DESC terms mean ascending and descending, respectively. If you don't specify ASC or DESC, PostgreSQL assumes that you want to see results in ascending order. The expression following ORDER BY is called a sort key.
Let's look at a simple example:
movies=# SELECT * FROM customers ORDER BY balance; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 5 | Funkmaster, Freddy | 555-FUNK | | (5 rows)
You can see that this SELECT command returns the result set in ascending order of the balance column. Here is the same query, but in descending order:
movies=# SELECT * FROM customers ORDER BY balance DESC; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 5 | Funkmaster, Freddy | 555-FUNK | | 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 (5 rows)
This time, the largest balance is first, followed by successively smaller values.
You may have noticed something odd about how the ORDER BY clause handles the customer named Freddy Funkmaster. Recall from the previous section that NULL cannot be compared to other values. By its very nature, the ORDER BY clause must compare values. PostgreSQL resolves this issue with a simple rule: NULL is always considered larger than all other values when evaluating an ORDER BY clause.
You can include multiple sort keys in the ORDER BY clause. The following query sorts customers in ascending balance order, and then in descending birth_date order:
movies=# SELECT * FROM customers ORDER BY balance, birth_date DESC; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 5 | Funkmaster, Freddy | 555-FUNK | | (5 rows)
When an ORDER BY clause contains multiple sort keys, you are telling PostgreSQL how to break ties. You can see that customers 1 and 3 have the same value (0.00) in the balance columnyou have asked PostgreSQL to order rows using the balance column. What happens when PostgreSQL finds two rows with the same balance? When two sort key values are equal, PostgreSQL moves to the next sort key to break the tie. If two sort key values are not equal, sort keys with a lower precedence are ignored. So, when PostgreSQL finds that customers 1 and 3 have the same balance, it moves to the birth_date column to break the tie.
If you don't have a sort key with a lower precedence, you won't be able to predict the ordering of rows with duplicate sort key values.
You can include as many sort keys as you like.
LIMIT and OFFSET
Occasionally, you will find that you want to answer a question such as "Who are my top 10 salespeople?" In most relational databases, this is a difficult question to ask. PostgreSQL offers two extensions that make it easy to answer "Top n" or "Bottom n"-type questions. The first extension is the LIMIT clause. The following query shows the two customers who owe you the most money:
movies=# SELECT * FROM customers ORDER BY balance DESC LIMIT 2; customer_id | customer_name | phone | birth_date | balance -------------+--------------------+----------+------------+--------- 5 | Funkmaster, Freddy | 555-FUNK | | 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 (2 rows)
You can see here that I used an ORDER BY clause so that the rows are sorted such that the highest balances appear firstin most cases, you won't use a LIMIT clause without also using an ORDER BY clause. Let's change this query a littlethis time we want the top five customers who have a balance over $10:
movies=# SELECT * FROM customers movies-# WHERE movies-# balance >= 10 movies-# ORDER BY balance DESC movies-# LIMIT 5; customer_id | customer_name | phone | birth_date | balance -------------+----------------+----------+------------+--------- 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 (1 row)
This example shows that the LIMIT clause won't always return the number of rows that were specified. Instead, LIMIT returns no more than the number of rows that you request. In this sample database, you have only one customer who owes you more than $10.
The second extension is the OFFSET n clause. The OFFSET n clause tells PostgreSQL to skip the first n rows of the result set. For example:
movies=# SELECT * FROM customers ORDER BY balance DESC OFFSET 1; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 (4 rows)
In this case, we are viewing all the customers except the customer with the greatest balance. It's common to use LIMIT and OFFSET together:
movies=# SELECT * FROM customers movies-# ORDER BY balance DESC LIMIT 2 OFFSET 1; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 (2 rows)
Formatting Column Results
So far, you have seen how to tell PostgreSQL which rows you want to view, which columns you want to view, and the order in which the rows should be returned. Let's take a short side-trip here and learn how to change the appearance of the values that you select.
Take a look at the following query:
movies=# SELECT customer_id, customer_name, balance, balance+4 FROM customers; customer_id | customer_name | balance | ?column? -------------+----------------------+---------+---------- 3 | Panky, Henry | 0.00 | 4.00 1 | Jones, Henry | 0.00 | 4.00 4 | Wonderland, Alice N. | 3.00 | 7.00 2 | Rubin, William | 15.00 | 19.00 5 | Funkmaster, Freddy | | (5 rows)
PostgreSQL inserts two lines of text between your query and the result set. These two lines are (obviously) column headings. You can see that the header for each of the first three columns contains the name of the column. What about the last column? When you SELECT an expression, PostgreSQL uses "?column?" for the field header[10].
[10] Actually, if you SELECT a function (such as AGE() or SQRT()), PostgreSQL will use the name of the function for the field header.
You can change field headers using the AS clause:
movies=# SELECT customer_id, customer_name, movies-# balance AS "Old balance", movies-# balance + 4 AS "New balance" movies-# FROM customers; customer_id | customer_name | Old Balance | New balance -------------+----------------------+-------------+------------- 3 | Panky, Henry | 0.00 | 4.00 1 | Jones, Henry | 0.00 | 4.00 4 | Wonderland, Alice N. | 3.00 | 7.00 2 | Rubin, William | 15.00 | 19.00 5 | Funkmaster, Freddy | | (5 rows)
Notice that you can provide a field header for table columns as well as for expressions. If you rename a field and the query includes an ORDER BY clause that refers to the field, the ORDER BY should use the new name, not the original one:
movies=# SELECT id, customer_name, movies-# balance AS "Old balance", movies-# balance + 4 AS "New balance" movies-# FROM customers movies-# ORDER BY "Old balance"; customer_id | customer_name | Old Balance | New balance -------------+----------------------+-------------+------------- 3 | Panky, Henry | 0.00 | 4.00 1 | Jones, Henry | 0.00 | 4.00 4 | Wonderland, Alice N. | 3.00 | 7.00 2 | Rubin, William | 15.00 | 19.00 5 | Funkmaster, Freddy | | (5 rows)
This section explained how to change the column headers for a SELECT command. You can also change the appearance of the data values. In the next section, I'll show you a few examples using date values for illustration.
Working with Date Values
PostgreSQL supports six basic date, time, and date/time data types, as shown in Table 1.7. I'll use the term temporal to cover date, time, and date/time data types.
Data Type Name |
Type of Data Stored |
Earliest Date/Time |
Latest Date/Time |
---|---|---|---|
TIMESTAMP |
Date/Time |
4713 BC |
1465001 AD |
TIMESTAMP WITH TIME ZONE |
Date/Time |
1903 AD |
2037 AD |
INTERVAL |
Interval |
-178000000 years |
178000000 years |
DATE |
Date |
4713 BC |
32767 AD |
TIME |
Time |
00:00:00.00 |
23:59:59.99 |
TIME WITH TIME ZONE |
Time |
00:00:00.00+12 |
23:59:59.99-12 |
I'll cover the details of the date/time data types in Chapter 2. You have already seen two of these temporal data types. The customers table contains a DATE column (birth_date):
movies=# d customers Table "customers" Column | Type | Modifiers ---------------+-----------------------+----------- customer_id | integer | not null customer_name | character varying(50) | not null phone | character(8) | birth_date | date | balance | numeric(7,2) | Indexes: "customers_customer_id_key" UNIQUE, btree (customer_id) movies=# SELECT customer_name, birth_date FROM customers; customer_name | birth_date ----------------------+------------ Jones, Henry | 1970-10-10 Rubin, William | 1972-07-10 Panky, Henry | 1968-01-21 Wonderland, Alice N. | 1969-03-05 Funkmaster, Freddy | (5 rows)
You've also seen the INTERVAL data typethe AGE() function returns an INTERVAL:
movies=# SELECT customer_name, AGE( birth_date ) FROM customers; customer_name | age ----------------------+------------------------------ Jones, Henry | 31 years 4 mons 8 days 01:00 Rubin, William | 29 years 7 mons 8 days 01:00 Panky, Henry | 34 years 28 days Wonderland, Alice N. | 32 years 11 mons 13 days Funkmaster, Freddy | (5 rows)
Date/time values are usually pretty easy to work with, but there is a complication that you need to be aware of. Let's say that I need to add a new customer:
movies=# INSERT INTO customers movies-# VALUES movies-# ( movies-# 7, 'Gull, Jonathon LC', '555-1111', '02/05/1984', NULL movies-# );
This customer has a birth_date of '02/05/1984'does that mean "February 5th 1984", or "May 2nd 1984"? How does PostgreSQL know which date I meant? The problem is that a date such as '02/05/1984' is ambiguousyou can't know which date this string represents without knowing something about the context in which it was entered. '02/05/1984' is ambiguous. 'May 02 1984' is unambiguous.
PostgreSQL enables you to enter and display dates in a number of formatssome date formats are ambiguous and some are unambiguous. The DATESTYLE runtime variable tells PostgreSQL how to format dates when displaying data and how to interpret ambiguous dates that you enter.
The DATESTYLE variable can be a little confusing. DATESTYLE is composed of two parts. The first part, called the convention, tells PostgreSQL how to interpret ambiguous dates. The second part, called the display format, determines how PostgreSQL displays date values. The convention controls date input and the display format controls date output. Table 1.8 shows the DATESTYLE display formats.
Display Format |
US Convention |
European Convention |
---|---|---|
ISO |
1984-05-02 |
1984-05-02 |
GERMAN |
02.05.1984 |
02.05.1984 |
POSTGRES |
Wed May 02 1984 |
Wed 02 May 1984 |
SQL |
05/02/1984 |
02/05/1984 |
Let's talk about the display format first. PostgreSQL supports four different display formats. Three of the display formats are unambiguous and one is ambiguous.
The default display format is named ISO. In ISO format, dates always appear in the form 'YYYY-MM-DD'. The next display format is GERMAN. In GERMAN format, dates always appear in the form 'DD.MM.YYYY'. The ISO and GERMAN formats are unambiguous because the format never changes. The POSTGRES format is also unambiguous, but the display format can vary. PostgreSQL needs a second piece of information (the convention) to decide whether the month should appear before the day (US convention) or the day should appear before the month (European convention). In POSTGRES format, date values display the day-of-the-week and month name in abbreviated text form; for example 'Wed May 02 1984' (US) or 'Wed 02 May 1984' (European).
The final display format is SQL. SQL format is ambiguous. In SQL format, the date 'May 02 1984' is displayed as '05/02/1984' (US), or as '02/05/1984'(European).
As I mentioned earlier, the ISO and GERMAN display formats are unambiguous. In ISO format, the month always precedes the day. In GERMAN format, the day always precedes the month. If you choose POSTGRES or SQL format, you must also specify the order in which you want the month and day components to appear. You can specify the desired display format and month/day ordering (that is, the convention) in the DATESTYLE runtime variable:
movies=# SET DATESTYLE TO 'US,ISO'; -- 1984-05-02 movies=# SET DATESTYLE TO 'US,GERMAN'; -- 02.05.1984 movies=# SET DATESTYLE TO 'US,POSTGRES'; -- Wed May 02 1984 movies=# SET DATESTYLE TO 'US,SQL'; -- 05/02/1984 movies=# SET DATESTYLE TO 'EUROPEAN,ISO'; -- 1984-05-02 movies=# SET DATESTYLE TO 'EUROPEAN,GERMAN'; -- 02.05.1984 movies=# SET DATESTYLE TO 'EUROPEAN,POSTGRES'; -- Wed 02 May 1984 movies=# SET DATESTYLE TO 'EUROPEAN,SQL'; -- 02/05/1984
The convention part of the DATESTYLE variable determines how PostgreSQL will make sense of the date values that you enter. The convention also affects the ordering of the month and day components when displaying a POSTGRES or SQL date. Note that you are not restricted to entering date values in the format specified by DATESTYLE. For example, if you have chosen to display dates in 'US,SQL' format, you can still enter date values in any of the other formats.
Recall that the ISO and GERMAN date formats are unambiguousthe ordering of the month and day components is predefined. A date entered in POSTGRES format is unambiguous as wellyou enter the name of the month so it cannot be confused with the day. If you choose to enter a date in SQL format, PostgreSQL will look to the first component of DATESTYLE (that is, the convention) to determine whether you want the value interpreted as a US or a European date. Let's look at a few examples.
movies=# SET DATESTYLE TO 'US,ISO'; movies=# SELECT CAST( '02/05/1984' AS DATE ); 1984-02-05 movies=# SET DATESTYLE TO 'EUROPEAN,ISO'; movies=# SELECT CAST( '02/05/1984' AS DATE ); 1984-05-02
In this example, I've asked PostgreSQL to display dates in ISO format, but I've entered a date in an ambiguous format. In the first case, you can see that PostgreSQL interpreted the ambiguous date using US conventions (the month precedes the day). In the second case, PostgreSQL uses European conventions to interpret the date.
Now let's see what happens when I enter an unambiguous date:
movies=# SET DATESTYLE TO 'US,ISO'; SET VARIABLE movies=# SELECT CAST( '1984-05-02' AS DATE ); 1984-05-02 movies=# SET DATESTYLE TO 'EUROPEAN,ISO'; SET VARIABLE movies=# SELECT CAST( '1984-05-02' AS DATE ); 1984-05-02
This time, there can be no confusionan ISO-formatted date is always entered in 'YYYY-MM-DD' format. PostgreSQL ignores the convention.
So, you can see that I can enter date values in many formats. If I choose to enter a date in an ambiguous format, PostgreSQL uses the convention part of the current DATESTYLE to interpret the date. I can also use DATESTYLE to control the display format.
Matching Patterns
In the previous two sections, you took a short detour to learn a little about how to format results. Now let's get back to the task of producing the desired results.
The WHERE clause is used to restrict the number of rows returned by a SELECT command[11]. Sometimes, you don't know the exact value that you are searching for. For example, you may have a customer ask you for a film, but he doesn't remember the exact name, although he knows that the film has the word "Citizen" in the title. PostgreSQL provides two features that make it possible to search for partial alphanumeric values.
[11] Technically, the WHERE clause constrains the set of rows affected by a SELECT, UPDATE, or DELETE command. I'll show you the UPDATE and DELETE commands a little later.
LIKE and NOT LIKE
The LIKE operator provides simple pattern-matching capabilities. LIKE uses two special characters that indicate the unknown part of a pattern. The underscore (_) character matches any single character. The percent sign (%) matches any sequence of zero or more characters. Table 1.9 shows a few examples.
String |
Pattern |
Result |
---|---|---|
The Godfather |
%Godfather% |
Matches |
The Godfather |
%Godfather |
Matches |
The Godfather |
%Godfathe_ |
Matches |
The Godfather |
___ Godfather |
Matches |
The Godfather |
Godfather% |
Does not match |
The Godfather |
_Godfather |
Does not match |
The Godfather: Part II |
%Godfather |
Does not match |
Now let's see how to use the LIKE operator in a SELECT command:
movies=# SELECT * FROM tapes WHERE title LIKE '%Citizen%'; tape_id | title | duration ----------+----------------------+---------- OW-41221 | Citizen Kane | KJ-03335 | American Citizen, An | (2 rows)
The LIKE operator is case-sensitive:
movies=# SELECT * FROM tapes WHERE title LIKE '%citizen%'; tape_id | title | duration ---------+-------+---------- (0 rows)
If you want to perform case-insensitive pattern matching, use the ILIKE operator:
movies=# SELECT * FROM tapes WHERE title ILIKE '%citizen%'; tape_id | title | duration ----------+----------------------+---------- OW-41221 | Citizen Kane | KJ-03335 | American Citizen, An | (2 rows)
You can, of course, combine LIKE and ILIKE with the NOT operator to return rows that do not match a pattern:
movies=# SELECT * FROM tapes WHERE title NOT ILIKE '%citizen%'; tape_id | title | duration ----------+---------------+-------------- AB-12345 | The Godfather | AB-67472 | The Godfather | MC-68873 | Casablanca | AH-54706 | Rear Window | OW-42200 | Sly | 01:36 OW-42201 | Stone | 4 days 01:36 (6 rows)
Pattern Matching with Regular Expressions
The LIKE and ILIKE operators are easy to use, but they aren't very powerful. Fortunately, PostgreSQL lets you search for data using regular expressions. A regular expression is a string that specifies a pattern. The language that you use to create regular expressions is far more powerful than the LIKE and ILIKE operators. You have probably used regular expressions before; programs such as grep, awk, and the Unix (and DOS) shells use regular expressions.
The LIKE and ILIKE operators define two pattern-matching characters; the regular expression operator defines far more. First, the character "." within a regular expression operates in the same way as the "_" character in a LIKE pattern: it matches any single character. The characters ".*" in a regular expression operate in the same way as the "%" character in a LIKE pattern: they match zero or more occurrences of any single character.
Notice that in a regular expression, you use two characters to match a sequence of characters, whereas you use a single character in a LIKE pattern. The regular expression ".*" is actually two regular expressions combined into one complex expression. As I mentioned earlier, the "." character matches any single character. The "*" character matches zero or more occurrences of the pattern that precedes it. So, ".*" means to match any single character, zero or more times. There are three other repetition operators: The "+" character matches one or more occurrences of the preceding pattern, and the "?" character matches zero or one occurrence of the preceding pattern. If you need to get really fancy (I never have), you can use the form "{x[,y]}" to match at least x and no more than y occurrences of the preceding pattern.
You can also search for things other than ".". For example, the character "^" matches the beginning of a string and "$" matches the end. The regular expression syntax even includes support for character classes. The pattern "[:upper:]*[:digit:]" will match any string that includes zero or more uppercase characters followed by a single digit.
The "|" character gives you a way to search for a string that matches either of two patterns. For example, the regular expression "(^God)|.*Donuts.*" would match a string that either starts with the string "God" or includes the word "Donuts".
Regular expressions are extremely powerful, but they can get awfully complex. If you need more information, Chapter 4 of the PostgreSQL User's Manual provides an exhaustive reference to the complete regular expression syntax.
Table 1.10 shows how to construct regular expressions that match the same strings matched by the LIKE patterns in shown in Table 1.9.
String |
Pattern |
Result |
---|---|---|
The Godfather |
.*Godfather |
Matches |
The Godfather |
.*Godfather.* |
Matches |
The Godfather |
.*Godfathe. |
Matches |
The Godfather |
... Godfather |
Matches |
The Godfather |
Godfather.* |
Does not match |
The Godfather |
.Godfather |
Does not match |
The Godfather: Part II |
.*Godfather |
Does not match |