Hack 13. Choose the Right Join Style for Your Relationships
When a relationship between tables is optional, you need an OUTER JOIN. When querying over many changes, if you require an OUTER JOIN you sometimes have to change all the other INNER JOINs into OUTER JOINs.
There are two common patterns of JOINs: the chain and the star, as shown in Figure 2-1 and Figure 2-2. Each is described in the following sections.
Figure 2-1. A JOIN chain
Figure 2-2. A JOIN star
2.7.1. A JOIN Chain
There are two references in this JOIN chain example. The reference from trip, shown in Table 2-7, to budget, shown in Table 2-8, is optionalusers may put a NULL value in the budget field of a trip row. The link from budget to staff (shown in Table 2-9) is mandatory; every row of the budget table must have a value in the budgetHolder field. Therefore, you use an OUTER JOIN when querying across trip and budget and an INNER JOIN when querying across budget and staff.
tripID | description | budget |
---|---|---|
TR01 | Sicily | NULL |
TR02 | Egypt | CTH22 |
budgetId | description | budgetHolder(NOT NULL) |
---|---|---|
CT22 | Officer's mess | ST02 |
staffId | name | rank |
---|---|---|
ST01 | Yossarian | Captain |
ST02 | Milo | Lieutenant |
If you want to list all of the trips with associated budget details, you must use a LEFT OUTER JOIN. Not all trips are associated with a budget; the LEFT OUTER JOIN will include those with and without a budget:
mysql> SELECT tripID, trip.description,budget.description -> FROM trip LEFT OUTER JOIN budget ON (trip.budget=budget.budgetID); +--------+-------------+----------------+ | tripID | description | description | +--------+-------------+----------------+ | TR01 | Sicily | NULL | | TR02 | Egypt | Officer's Mess | +--------+-------------+----------------+
|
If you want to include the name of the budget holder you need to JOIN with the staff table. As NULL is not permitted in the budgetHolder column, you might think that an INNER JOIN (the default) would give the correct results, but that is not so:
mysql> SELECT tripID, trip.description,budget.description, name -> FROM trip LEFT OUTER JOIN budget ON (trip.budget=budget.budgetID) -> INNER JOIN staff ON (budgetHolder=staffID); +--------+-------------+----------------+------+ | tripID | description | description | name | +--------+-------------+----------------+------+ | TR02 | Egypt | Officer's Mess | Milo | +--------+-------------+----------------+------+
The chain of JOINs is calculated from left to right, so the result of the LEFT JOIN in the previous query would be INNER JOINed to the budget table. Because TR02 has a NULL value for the budget, the join condition filters that row out. You could bracket or reorder the JOINs so that the INNER JOIN is calculated first, but you give the optimizer more room to maneuver if you just continue the LEFT OUTER JOIN:
mysql> SELECT tripID, trip.description,budget.description, name -> FROM trip LEFT OUTER JOIN budget ON (trip.budget=budget.budgetID) -> LEFT OUTER JOIN staff ON (budgetHolder=staffID); +--------+-------------+----------------+------+ | tripID | description | description | name | +--------+-------------+----------------+------+ | TR01 | Sicily | NULL | NULL | | TR02 | Egypt | Officer's Mess | Milo | +--------+-------------+----------------+------+
2.7.2. A JOIN Star
The star pattern comprises one central table; the satellite tables are related to rows in the central table. These relationships may be optional or mandatory.
In this example, the central table is flightTicket. All flight tickets originate from an airport but only some of them were sold by an agent and only some of them involve passengers with a frequent flyer account (freqFlyer):
CREATE TABLE flightTicket (tkid CHAR(4) PRIMARY KEY ,agent CHAR(4) NULL ,departFrom CHAR(3) NOT NULL ,freqFlyer CHAR(4) NULL ,FOREIGN KEY (agent) REFERENCES agent(id) ,FOREIGN KEY (departFrom) REFERENCES airport(id) ,FOREIGN KEY (freqFlyer) REFERENCES freqFlyer(id) );
In a star schema, use the LEFT OUTER JOIN only for the tables that need it. The order of the joins does not matter:
mysql> SELECT airport.name AS airport, -> agent.name AS agent, -> freqFlyer.name freqFlyer -> FROM flightTicket LEFT OUTER JOIN agent ON (agent =agent.id) -> INNER JOIN airport ON (departFrom=airport.id) -> LEFT OUTER JOIN freqFlyer ON (freqFlyer =freqFlyer.id); +-----------+------------+-------------+ | airport | agent | freqFlyer | +-----------+------------+-------------+ | Edinburgh | NULL | NULL | | Edinburgh | Smokehouse | NULL | | Heathrow | Smokehouse | Bill Peters | | Heathrow | NULL | John Weak | +-----------+------------+-------------+
Because the tables are linked to each other only through the central table, the central table's requirements dictate the JOINs. A NULL value for agent does not affect how freqFlyer or airport relates to flightTicket, and neither does a NULL value for freqFlyer affect how airport or agent relates to flightTicket.