Hack 89. Choose Any Three of Five
In a one-to-many relationship, use GROUP BY to search for the existence of multiple specific rows on the many side.
Imagine a database table for storing information about job candidates (as shown here, and in Table 10-17):
create table candidates ( id integer not null primary key , lname varchar(50) not null , fname varchar(50) null , summary text not null ); insert into candidates values ( 1, 'Smith','John' ,'Senior web developer specializing in FrontPage ...' ) ,( 2, 'Jones','Todd' ,'Shy DBA looking for remote cubicle to hide in ... ' );
id | lname | fname | summary |
---|---|---|---|
1 | Smith | John | Senior web developer specializing in FrontPage... |
2 | Jones | Todd | Shy DBA looking for a remote cubicle to hide in ... |
This is a straightforward table, but it's not easy to search methodically. Add another table, this one to identify each candidate's specific skills (see Table 10-18):
create table candidate_skills ( cid integer not null , foreign key cid_fk ( cid ) references candidates ( id ) , skill varchar(37) not null , primary key ( cid , skill ) , rating tinyint default 3 ); insert into candidate_skills values ( 1, 'FrontPage', 5 ) ,( 1, 'HTML', 3 ) ,( 1, 'CSS', 3 ) ,( 1, 'JavaScript', 2 ) ,( 1, 'Usability', 2 ) ,( 1, 'DB design', 2 ) ,( 1, 'MySQL', 2 ) ,( 2, 'SQL Server', 4 ) ,( 2, 'Oracle', 3 ) ,( 2, 'HTML', 3 ) ,( 2, 'DB design', 2 );
cid | skill | rating |
---|---|---|
1 | FrontPage | 5 |
1 | HTML | 3 |
1 | CSS | 3 |
1 | JavaScript | 2 |
1 | Usability | 2 |
1 | DB Design | 2 |
1 | MySQL | 2 |
2 | SQL Server | 4 |
2 | Oracle | 3 |
2 | HTML | 3 |
2 | DB Design | 2 |
Each candidate is represented in one row of the parent candidates table and one or more rows (technically zero or more, but no one would seriously enter a candidate with no skills) in the candidate_skills table. The specific candidate in the candidate_skills table is identified by the foreign key cid, which references the candidates table. This is a classic one-to-many relationship.
The primary key of candidate_skills is a composite key consisting of the cid and the skill together, which means that the same skill cannot be entered for the same candidate more than once. The rating column might be a value from 0 (none) through 5 (guru), indicating skill level.
|
Now consider the following requirement: find all candidates who have both DB Design and SQL Server skills.
10.13.1. A JOIN Solution
One way to solve this requirement is with JOINs for each specific skill:
select C.lname , C.fname from candidates as C inner join candidate_skills as CS1 on CS1.cid = c.id inner join candidate_skills as CS2 on CS2.cid = C.id where CS1.skill = 'DB Design' and CS2.skill like 'SQL Server%' lname fname Jones Todd
This query joins each candidate row to a candidate_skills row, first for one skill, and then to another candidate_skills row for the other skill. Because these are INNER JOINs, both skills are required, and only candidates with both skills are returned.
You can extend this method for as many joins as desired. Need a third skill? Join to the candidate_skills table a third time. When you need to find an exact set of specific rows in a one-to-many relationship, the JOIN approach is practical and efficient, if cumbersome.
Now consider the following requirementfind all candidates who have at least three of the following five skills: HTML, CSS, JavaScript, DB Design, and MySQL.
How in the world are you going to write joins that will accomplish this? They'd have to be LEFT OUTER JOINs, because to use INNER JOINs as we did earlier would result in only candidates with all five skills. Yet with LEFT OUTER JOINs, you would need to write a fairly complicated WHERE clause to see which joined rows actually were present (and don't contain NULLs), and that there were at least three of them.
There is an easier way.
10.13.2. A GROUP BY Solution
Here's an approach that uses GROUP BY to accomplish this:
select C.lname , C.fname from candidates as C inner join candidate_skills as CS on CS.cid = C.id where CS.skill in ( 'HTML' , 'CSS' , 'JavaScript' , 'DB Design' , 'MySQL' ) group by C.lname , C.fname having count(*) >= 3 lname fname Smith John
Look first at the FROM clause. You're joining each candidate row to all of its candidate_skills rows, and in the WHERE clause, you accept a skill only if it's one of the five target skills (all other skills get filtered out).
Then the magic happens. Because of the GROUP BY on the candidate name columns, you get one result row per candidate, and you can now use an aggregate function to determine how many candidate_skills rows were retrieved for the candidate. Note that the query does not return the individual rows; they are retrieved, and then evaluated in the GROUP BY clause, but the query returns only an aggregate row. The aggregate COUNT function counts the retrieved rows of skills that were among the five required skills as determined by the WHERE clause, and the HAVING clause ensures that there were at least three of these during grouping; if there were, an aggregate row for that group (the candidate) is returned.
Sweet, eh?
Rudy Limeback