SQL Performance Tuning
Until June 2001, Oracle didn't support ANSI-style joins, but now that Oracle does, each of the Big Eight can work with either old-style joins, ANSI-style joins with USING, or ANSI-style joins with ON. Here are examples of each type: old-style join: SELECT * FROM Table1, Table2 WHERE Table1.column1 = Table2.column1 /* old style */ ANSI-style join with USING: SELECT * FROM Table1 JOIN Table2 USING (column1) /* ANSI style */ ANSI-style join with ON: SELECT * FROM Table1 JOIN Table2 ON Table1.column1 = Table2.column1 /* ANSI style */
Portability DBMS vendors prefer the second ANSI-style join, with an ON clause. Only MySQL and Oracle also support a join with a USING clause.
The terms "old-style" join and "ANSI-style" join are nobody's official terminology, and we use them only because we needed some names . The fact is that the old style is acceptable in SQL:1999, and the SQL Standard does not deprecate it. ("Deprecate" is a formal word. It means that the SQL Standard foundation document doesn't hint that the old style will be dropped from a future version of the official Standard.) So both styles are acceptable. There remains some debate about which style is "right" between those who prefer the old style (especially Oracle programmers) and those who prefer the ANSI style ( especially Microsoft programmers). The only definite thing is that outer joins should be in ANSI style. Does the style affect the performance? To answer this question, we wrote the same join two different ways, and tried them both out on the Big Eight. Watch the WHERE clause: SELECT * FROM Table1 JOIN Table2 ON Table1.column1 = Table2.column1 WHERE Table2.column2 = 22 /* ANSI style */ SELECT * FROM Table1, Table2 WHERE Table1.column1 = Table2.column1 AND Table2.column2 = 22 /* old style */ GAIN: 1/8 We found that with most DBMSs, these two statements ran equally fast. But with IBM, the old style is sometimes faster. The reason is hinted at in this quote: "When you use the ON clause to specify the join, you can use the WHERE clause as a post-join filter. The database server applies the post-join filter of the WHERE clause to the results of the outer join." IBM Informix Guide to SQL Syntax Documentation In other words, the restrictive expression is processed after the joining clause. That's what's supposed to happen logically. But if the clauses aren't optimized together at the same time, then the DBMS misses the best join plan: Make Table2 the outer table, find rows in the outer table where Table2. column2 = 22 , and then search for Table1.column1 = Table2.column1 in the inner loop. Our conclusion, then, is that old style is never slower and can be faster. While we're on the subject of syntax, we should note that it's traditional to put the joining clauses before the restrictive expression, as we've done in all our examples. However, even with rule-based or primitive optimizers, putting the clauses in a different order will not affect performance. Outer Joins
Only one outer join type is supported by all DBMSs: LEFT joins. And only one syntax is supported by all DBMSs for left joins: ANSI style. Table 5-2 shows the SQL Standard requirements and which options are supported by the Big Eight. Here's an example of an outer join: SELECT Table1.column1 FROM Table1 LEFT JOIN Table2 ON Table1.column1 = Table2.column1 An outer join can be much slower than an inner join because Table1 must be the outer table and Table2 must be the inner tableotherwise, the join won't work. So even if Table1 is smaller and better indexed and has a restrictive expression, it still can't be the inner table. And that means the DBMS can't pick the optimum join plan. Table 5-2. ANSI/DBMS Outer Join Support
If the DBMS applies a restrictive expression after the join is complete, there won't be any weird problems involving [NOT] NULL. For example, suppose Table1 has a column called column1 that contains the values {1 , 3} . Meanwhile, Table2 also has a column1 , and its values are {1 , 2} . A simple left join of these tables produces these result rows (assume all columns have a default value that is NULL):
A less-simple left join on the same tables can be done with this syntax: SELECT Table1.column1, Table2.column1 FROM Table1 LEFT JOIN Table2 ON Table1.column1 = Table2.column1 WHERE Table2.column1 IS NULL The result of this query should be these result rows:
But that's not what Microsoft used to return! The plausible explanation was that Microsoft did the restrictive expression first. Good for performance; too bad it produced the wrong result. Fortunately, the current version of Microsoft produces the correct result. If there's one thing that an outer join is definitely faster than, it's UNION. Before the introduction of outer join syntax, the simple left join had to be phrased as: SELECT Table1.column1, Table2.column1 /* the inner part */ FROM Table1, Table2 WHERE Table1.column1 = Table2.column1 UNION ALL SELECT column1, CAST(NULL AS INTEGER) /* the outer part */ FROM Table1 WHERE Table1.column1 NOT IN (SELECT Table2.column1 FROM Table2) If you see this syntax in some old program, replace it with a true left join. We tried that on the Big Eight and got a gain. Here's the equivalent left join statement: SELECT Table1.column1, Table2.column1 FROM Table1 LEFT JOIN Table2 ON Table1.column1 = Table2.column1 GAIN: 6/7
Portability MySQL doesn't support UNION or subqueries. The gain shown is for only seven DBMSs.
From the old Microsoft fiasco and the poor performance of the UNION, we glean two general rules:
|