SQL Cookbook (Cookbooks (OReilly))
Recipe B.3. Answering Questions Involving "at Most"
Questions involving "at most" represent another type of query problem that you'll encounter from time to time. It's easy enough to find rows for which a condition is true, but what if you want to place a limit on the number of such rows? That's what the next next two questions are all about. Question 4
You want to find the students who take at most two courses. Students who do not take any courses should be excluded. Of the students who take courses, only AARON takes more than two and should be excluded from the result set. Ultimately, you want to return the following result set: SNO SNAME AGE --- ---------- ---------- 2 CHUCK 21 3 DOUG 20 4 MAGGIE 19 5 STEVE 22 6 JING 18 MySQL and PostgreSQL
Use the aggregate function COUNT to determine which students take no more than two courses: 1 select s.sno,s.sname,s.age 2 from student s, take t 3 where s.sno = t.sno 4 group by s.sno,s.sname,s.age 5 having count(*) <= 2 DB2, Oracle, and SQL Server
Use the window function COUNT OVER, again to determine which students take no more than two courses: 1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4 count(*) over ( 5 partition by s.sno,s.sname,s.age 6 ) as cnt 7 from student s, take t 8 where s.sno = t.sno 9 ) x 10 where cnt <= 2
Discussion
Both solutions work by simply counting the number of times a particular SNO occurs in table TAKE. The inner join to table TAKE ensures that students who take no courses are excluded from the final result set. Original solution
Rozenshtein used the aggregate solution shown here for MySQL and PostgreSQL in his book along with an alternative solution using multiple self joins, shown here: select distinct s.* from student s, take t where s.sno = t.sno and s.sno not in ( select t1.sno from take t1, take t2, take t3 where t1.sno = t2.sno and t2.sno = t3.sno and t1.cno < t2.cno and t2.cno < t3.cno )
The multiple self-join solution is interesting because it solves the problem without using aggregation. To understand how the solution works, focus on the WHERE clause of the subquery. The inner joins on SNO ensure that you are dealing with the same student across all columns of each row that can potentially be returned by the subquery. The less-than comparisons are what determine whether or not a student is taking more than two courses. The WHERE clause in the subquery can be stated as: "For a particular student, return rows where the first CNO is less than the second CNO and the second CNO is less than the THIRD CNO." If a student has fewer than three courses, that expression can never evaluate to true as there is no third CNO. The job of the subquery is to find students who take three or more courses. The outer query then returns students who take at least one course and are not amongst those returned by the subquery. Question 5
You want to find students who are older than at most two other students. Another way to think about the problem is to find only the students who are older than zero, one, or two other students. The final result set should be: SNO SNAME AGE ---- ---------- --- 6 JING 18 4 MAGGIE 19 1 AARON 20 9 GILLIAN 20 8 KAY 20 3 DOUG 20
MySQL and PostgreSQL
Use the aggregate function COUNT and a correlated subquery to find the students who are older than zero, one, or two other students: 1 select s1.* 2 from student s1 3 where 2 >= ( select count(*) 4 from student s2 5 where s2.age < s1.age ) DB2, Oracle, and SQL Server
Use the window function DENSE_RANK to find the students who are older than zero, one, or two other students: 1 select sno,sname,age 2 from ( 3 select sno,sname,age, 4 dense_rank( )over(order by age) as dr 5 from student 6 ) x 7 where dr <= 3
Discussion
The aggregate solution uses a scalar subquery to find all students who are older than no more than two other students. To see how this works, rewrite the solution to use a scalar subquery. In the following example, the column CNT represents the number of students that are younger than the current student: select s1.*, (select count(*) from student s2 where s2.age < s1.age) as cnt from student s1 order by 4 SNO SNAME AGE CNT --- ---------- ---------- ---------- 6 JING 18 0 4 MAGGIE 19 1 1 AARON 20 2 3 DOUG 20 2 8 KAY 20 2 9 GILLIAN 20 2 2 CHUCK 21 6 7 BRIAN 21 6 10 CHAD 21 6 5 STEVE 22 9
Rewriting the solution this way makes it easy to see that the students in the final result set are those for whom CNT is less than or equal to 2. The solution using the window function DENSE_RANK is similar to the scalar subquery example in that every row is ranked based on how many students are younger than the current student (ties are allowed and there are no gaps). The following query shows the output from the DENSE_RANK function: select sno,sname,age, dense_rank( )over(order by age) as dr from student SNO SNAME AGE DR --- ---------- ---------- ---------- 6 JING 18 1 4 MAGGIE 19 2 1 AARON 20 3 3 DOUG 20 3 8 KAY 20 3 9 GILLIAN 20 3 2 CHUCK 21 4 7 BRIAN 21 4 10 CHAD 21 4 5 STEVE 22 5 The final step is to wrap the query in an inline view and keep only those rows where DR is less than or equal to 3. Original solution
Rozenshtein takes an interesting approach to solving this problem by rephrasing it. Instead of "find the students who are older than at most two students," his approach is to "find the students who are not older than three or more (at least three) students." This approach is brilliant for those of you who want to learn how to problem solve in sets, because it forces you to find the solution in two passes:
The solution is shown below: select * from student where sno not in ( select s1.sno from student s1, student s2, student s3, student s4 where s1.age > s2.age and s2.age > s3.age and s3.age > s4.age ) SNO SNAME AGE --- ---------- --- 6 JING 18 4 MAGGIE 19 1 AARON 20 9 GILLIAN 20 8 KAY 20 3 DOUG 20
If you examine the solution from bottom up, you see that step 1, "find all students who are older than three or more students," is performed first and is shown below (using DISTINCT to reduce the result set size for readability): select distinct s1.* from student s1, student s2, student s3, student s4 where s1.age > s2.age and s2.age > s3.age and s3.age > s4.age SNO SNAME AGE --- ---------- --- 2 CHUCK 21 5 STEVE 22 7 BRIAN 21 10 CHAD 21
If you are getting confused by all the self joins, simply focus on the WHERE clause. S1.AGE is greater than S2.AGE so you know at that point any student who is older than at least one other student is considered. Next, S2.AGE is greater than S3.AGE. At this point any student who is older than two other students is considered. If you are stumbling at this point, try to keep in mind that greater-than comparisons are transitive. If S1.AGE is greater than S2.AGE, and S2.AGE is greater than S3.AGE, then it is also true that S1AGE is greater than S3.AGE. You may find it helpful to strip down the query to one self join and build the query once you understand what is returned by each step. For example, find all students who are older than at least one other student (all students except the youngest, JING, should be returned): select distinct s1.* from student s1, student s2 where s1.age > s2.age SNO SNAME AGE --- ---------- --- 5 STEVE 22 7 BRIAN 21 10 CHAD 21 2 CHUCK 21 1 AARON 20 3 DOUG 20 9 GILLIAN 20 8 KAY 20 4 MAGGIE 19
Next, find all students who are older than two or more students (now, both JING and MAGGIE should be excluded from the result set): select distinct s1.* from student s1, student s2, student s3 where s1.age > s2.age and s2.age > s3.age SNO SNAME AGE --- ---------- --- 1 AARON 20 2 CHUCK 21 3 DOUG 20 5 STEVE 22 7 BRIAN 21 8 KAY 20 9 GILLIAN 20 10 CHAD 21 Finally, find all students who are older than three or more students (only CHUCK, STEVE, BRIAN, and CHAD are in this result set): select distinct s1.* from student s1, student s2, student s3, student s4 where s1.age > s2.age and s2.age > s3.age and s3.age > s4.age SNO SNAME AGE --- ---------- --- 2 CHUCK 21 5 STEVE 22 7 BRIAN 21 10 CHAD 21
Now that you know which students are older than three or more other students, simply return only those students who are not amongst the four students above by using NOT IN with a subquery. |
Категории