Hack 75. Minimize Bandwidth in One-to-Many Joins
When a one-to-many JOIN produces excess "repetition" in a result set, use a UNION.
Suppose you have a web site containing articles and comments. The database for this might have one table containing the articles and another table for the comments on those articles, with a relationship between these tables, linking each comment to its article.
Here is the ARTICLES table:
CREATE TABLE articles ( id INTEGER not null PRIMARY KEY , title VARCHAR(100) not null , article TEXT not null )
Each column is NOT NULL because it is required (for the primary key) or expected by the application (an article must have both a title and article text to be accepted).
|
Here is the COMMENTS table:
CREATE TABLE comments ( id INTEGER not null PRIMARY KEY , article_id INTEGER not null , title VARCHAR(100) null , content TEXT not null , FOREIGN KEY (article_id) REFERENCES articles(id) )
The COMMENTS table has a similar structure, with an additional column being a foreign key to link each comment to the article on which it is commenting. The title is NULL because it is optional for comments. Finally, there is a large string data type column for the comment text, which you might think need not be as large as the column for the article text, but the business logic may allow for comments of arbitrary length.
This is a classical one-to-many relationship. Each article can have one or more comments, and each comment belongs to only one article.
|
9.12.1. Retrieve an Article and Its Comments
To retrieve an article, title, text, and all comments, you can use the following query:
SELECT articles.id , articles.title , articles.article , comments.id , comments.title , comments.content FROM articles LEFT OUTER JOIN comments ON comments.article_id = articles.id WHERE articles.id = 937 ORDER BY articles.id , comments.id
The query retrieves a specific article by ID, plus its associated comments, if any, and it's a left outer join because the article might not have comments. The result set will look something like this:
aid atitle article cid ctitle content 937 Title One day I ... 1013 I am 1st First Post! 937 Title One day I ... 1024 Very Useful! Great article! 937 Title One day I ... 1037 Nothing to add I agree w/2nd commenter. 937 Title One day I ... 1042 Me Too' I like this article too.
Because the design was based on the classical one-to-many relationship, the result set is the classical one-to-many result set. Note that the article text and title are repeated in the result set four times, and they could contain 2 GB of data. The query works, but it wastes bandwidth.
You could perform two separate queries, but refer to "Combine Your Queries" [Hack #70] for why this can be bad. There is another way to combine two queries into one, without the "repetition" of data in the result set that you see with a one-to-many join.
9.12.2. The UNION Query
The following UNION query consists of two SELECTs combined with the UNION ALL operator:
SELECT 'article' AS rowtype , articles.id , articles.title , articles.article FROM articles WHERE articles.id = 937 UNION ALL SELECT 'comment' AS rowtype , comments.id , comments.title , comments.content FROM articles INNER JOIN comments ON comments.article_id = articles.id WHERE articles.id = 937 ORDER BY rowtype
Although there are two SELECTs, there is only one ORDER BY, which sorts the combined results of both SELECTs into one sorted final result set. This is truly a hack; it relies on the fact that the rowtype article comes before comment in the ORDER BY.
The two SELECTs return the article and its comments, respectively. Each SELECT is driven by an identical WHERE clause, to select a specific article or its comments. The second SELECT's join must be an inner join rather than a left outer join, because the first SELECT retrieves the article and the second query retrieves only related comments.
9.12.3. Results
Here is the result of the query:
rowtype id title article article 937 Title One day I ... comment 1013 I am 1st First Post! comment 1024 Very Useful! Great article! comment 1037 Nothing to add I agree w/2nd commenter. comment 1042 Me Too' I like this article too.
Optimal bandwidth is achieved because there is no repetition of any large column value. The rowtype can be a single-byte column if desired; its values are assigned as literals in the SELECTs.
In the second SELECT, is it necessary to do the join at all? Could the second SELECT simply retrieve from the comments table using a similar WHERE condition on articles_id instead? Yes, but it's better to leave the join in place so that, if desired, instead of retrieving a specific article by ID, you could return all articles that meet a certain condition, such as a keyword in their title. This type of condition testing would require using the same WHERE clause in each SELECT.
9.12.4. Does This Always Work?
You cannot take this approach with every one-to-many relationship that you meet. In order for you to write this type of UNION query, both SELECTs must return a union-compatible result set. This means the same number of columns, with each pair of columns having compatible data types (dates with dates, numbers with numbers, and strings with strings). Articles and comments are well suited for this type of UNION because the tables share such a similar design.
Rudy Limeback