Multi-Table Joins
So far, all the queries that you've seen involve a single table. Most databases contain multiple tables and there are relationships between these tables. This sample database has an example:
movies=# d rentals Table "rentals" Attribute | Type | Modifier -------------+--------------+---------- tape_id | character(8) | not null rental_date | date | not null customer_id | integer | not null
Here's a description of the rentals table from earlier in this chapter:
"When a customer comes in to rent a tape, we will add a row to the rentals table to record the transaction. There are three pieces of information that we need to record for each rental: the tape_id, the customer_id, and the date that the rental occurred. Notice that each row in the rentals table refers to a customer (customer_id) and a tape (tape_id)."
You can see that each row in the rentals table refers to a tape (tape_id) and to a customer (customer_id). If you SELECT from the rentals table, you can see the tape ID and customer ID, but you can't see the movie title or customer name. What you need here is a join. When you need to retrieve data from multiple tables, you join those tables.
PostgreSQL (and all relational databases) supports a number of join types. The most basic join type is a cross-join (or Cartesian product). In a cross join, PostgreSQL joins each row in the first table to each row in the second table to produce a result table. If you are joining against a third table, PostgreSQL joins each row in the intermediate result with each row in the third table.
Let's look at an example. We'll cross-join the rentals and customers tables. First, I'll show you each table:
movies=# SELECT * FROM rentals; tape_id | rental_date | customer_id ----------+-------------+------------- AB-12345 | 2001-11-25 | 1 AB-67472 | 2001-11-25 | 3 OW-41221 | 2001-11-25 | 1 MC-68873 | 2001-11-20 | 3 (4 rows) movies=# SELECT * FROM customers; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00 5 | Funkmaster, Freddy | 555-FUNK | | 7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 | 8 | Grumby, Jonas | 555-2222 | 1984-02-21 | (7 rows)
Now I'll join these tables. To perform a cross-join, we simply list each table in the FROM clause:
movies=# SELECT rentals.*, customers.customer_id, customers.customer_name movies-# FROM rentals, customers; tape_id | rental_date | customer_id | customer_id | customer_name ----------+-------------+-------------+-------------+---------------------- AB-12345 | 2001-11-25 | 1 | 3 | Panky, Henry AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry AB-12345 | 2001-11-25 | 1 | 4 | Wonderland, Alice N. AB-12345 | 2001-11-25 | 1 | 2 | Rubin, William AB-12345 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy AB-12345 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC AB-12345 | 2001-11-25 | 1 | 8 | Grumby, Jonas AB-67472 | 2001-11-25 | 3 | 3 | Panky, Henry AB-67472 | 2001-11-25 | 3 | 1 | Jones, Henry AB-67472 | 2001-11-25 | 3 | 4 | Wonderland, Alice N. AB-67472 | 2001-11-25 | 3 | 2 | Rubin, William AB-67472 | 2001-11-25 | 3 | 5 | Funkmaster, Freddy AB-67472 | 2001-11-25 | 3 | 7 | Gull, Jonathon LC AB-67472 | 2001-11-25 | 3 | 8 | Grumby, Jonas OW-41221 | 2001-11-25 | 1 | 3 | Panky, Henry OW-41221 | 2001-11-25 | 1 | 1 | Jones, Henry OW-41221 | 2001-11-25 | 1 | 4 | Wonderland, Alice N. OW-41221 | 2001-11-25 | 1 | 2 | Rubin, William OW-41221 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy OW-41221 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC OW-41221 | 2001-11-25 | 1 | 8 | Grumby, Jonas MC-68873 | 2001-11-20 | 3 | 3 | Panky, Henry MC-68873 | 2001-11-20 | 3 | 1 | Jones, Henry MC-68873 | 2001-11-20 | 3 | 4 | Wonderland, Alice N. MC-68873 | 2001-11-20 | 3 | 2 | Rubin, William MC-68873 | 2001-11-20 | 3 | 5 | Funkmaster, Freddy MC-68873 | 2001-11-20 | 3 | 7 | Gull, Jonathon LC MC-68873 | 2001-11-20 | 3 | 8 | Grumby, Jonas (28 rows)
You can see that PostgreSQL has joined each row in the rentals table to each row in the customers table. The rentals table contains four rows; the customers table contains seven rows. The result set contains 4 ¥ 7 or 28 rows.
Cross-joins are rarely usefulthey usually don't represent real-world relationships.
The second type of join, the inner-join, is very useful. An inner-join starts with a cross-join, and then throws out the rows that you don't want. Take a close look at the results of the previous query. Here are the first seven rows again:
tape_id | rental_date | customer_id | customer_id | customer_name ----------+-------------+-------------+-------------+---------------------- AB-12345 | 2001-11-25 | 1 | 3 | Panky, Henry AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry AB-12345 | 2001-11-25 | 1 | 4 | Wonderland, Alice N. AB-12345 | 2001-11-25 | 1 | 2 | Rubin, William AB-12345 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy AB-12345 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC AB-12345 | 2001-11-25 | 1 | 8 | Grumby, Jonas . . . . . . . . . . . . . . .
These seven rows were produced by joining the first row in the rentals table:
tape_id | rental_date | customer_id ----------+-------------+------------- AB-12345 | 2001-11-25 | 1
with each row in the customers table. What is the real-world relationship between a rentals row and a customers row? Each row in the rentals table contains a customer ID. Each row in the customers table is uniquely identified by a customer ID. So, given a rentals row, we can find the corresponding customers row by searching for a customer where the customer ID is equal to rentals.customer_id. Looking back at the previous query, you can see that the meaningful rows are those WHERE customers.customer_id = rentals.customer_id.
Now you can construct a query that will show us all of the rentals and the names of the corresponding customers:
movies=# SELECT rentals.*, customers.customer_id, customers.customer_name movies-# FROM rentals, customers movies-# WHERE customers.customer_id = rentals.customer_id; tape_id | rental_date | customer_id | customer_id | customer_name ----------+-------------+-------------+-------------+--------------- AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry AB-67472 | 2001-11-25 | 3 | 3 | Panky, Henry OW-41221 | 2001-11-25 | 1 | 1 | Jones, Henry MC-68873 | 2001-11-20 | 3 | 3 | Panky, Henry (4 rows)
To execute this query, PostgreSQL could start by creating the cross-join between all the tables involved, producing an intermediate result table. Next, PostgreSQL could throw out all the rows that fail to satisfy the WHERE clause. In practice, this would be a poor strategy: Cross-joins can get very large quickly. Instead, the PostgreSQL query optimizer analyzes the query and plans an execution strategy to minimize execution time. I'll cover query optimization in Chapter 4.
Join Types
We've seen two join types so far: cross-joins and inner-joins. Now we'll look at outer-joins. An outer-join is similar to an inner-join: a relationship between two tables is established by correlating a column from each table.
In an earlier section, you wrote a query that answered the question: "Which customers are currently renting movies?" How would you answer the question: "Who are my customers and which movies are they currently renting?" You might start by trying the following query:
movies=# SELECT customers.*, rentals.tape_id movies-# FROM customers, rentals movies=# WHERE rentals.customer_id = customers.customer_id; customer_id | customer_name | phone | birth_date | balance | tape_id -------------+---------------+----------+------------+---------+---------- 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 | AB-12345 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 | AB-67472 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 | OW-41221 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 | MC-68873 rows)
Well, that didn't work. This query showed you which customers are currently renting movies (and the movies that they are renting). What we really want is a list of all customers and, if a customer is currently renting any movies, all the movies rented. This is an outer-join. An outer-join preserves all the rows in one table (or both tables) regardless of whether a matching row can be found in the second table.
The syntax for an outer-join is a little strange. Here is an example:
movies=# SELECT customers.customer_name, rentals.tape_id movies-# FROM customers LEFT OUTER JOIN rentals movies-# ON customers.customer_id = rentals.customer_id; customer_name | tape_id ----------------------+---------- Jones, Henry | AB-12345 Jones, Henry | OW-41221 Rubin, William | Panky, Henry | AB-67472 Panky, Henry | MC-68873 Wonderland, Alice N. | Funkmaster, Freddy | Gull, Jonathon LC | Grumby, Jonas | rows)
This query is a left outer-join. Why left? Because you will see each row from the left table (the table to the left of the LEFT OUTER JOIN phrase). An inner-join would list only two customers ("Jones, Henry" and "Panky, Henry")the other customers have no rentals.
A RIGHT OUTER JOIN preserves each row from the right table. A FULL OUTER JOIN preserves each row from both tables.
The following query shows a list of all customers, all tapes, and any rentals:
movies=# SELECT customers.customer_name, rentals.tape_id, tapes.title movies-# FROM customers FULL OUTER JOIN rentals movies-# ON customers.customer_id = rentals.customer_id movies-# FULL OUTER JOIN tapes movies-# ON tapes.tape_id = rentals.tape_id; customer_name | tape_id | title ----------------------+----------+---------------------- Jones, Henry | AB-12345 | The Godfather Panky, Henry | AB-67472 | The Godfather | | Rear Window | | American Citizen, An Panky, Henry | MC-68873 | Casablanca Jones, Henry | OW-41221 | Citizen Kane Rubin, William | | Wonderland, Alice N. | | Funkmaster, Freddy | | Gull, Jonathon LC | | Grumby, Jonas | | | | Sly | | Stone (13 rows)