Hack 27. Identify Overlapping Ranges

It's easy to build a query that searches for precise values, and you can even look for values that fall in the middle of something with BETWEEN. But things get a bit trickier when you've got two ranges to compare.

Suppose you have a table of job vacancies, and you allow other users to search your vacancies table for job information. Each job has a minimum and a maximum salary range, and each user can specify a minimum and maximum salary. For example, perhaps there's a job with "Salary between $20,000 and $22,000" and a user is looking for a job that pays between $21,500 and $30,000.

You can start with a CROSS JOIN to see all the possible matches between jobs and job-seekers. (in a real example, this will be expensive in terms of CPU usage and bandwidth, but there is only two of each in this example):

mysql> SELECT * FROM jobSeeker required CROSS JOIN jobAd offered -> ORDER BY name, descr; +---------+-------+-------+----------------+-------+-------+ | name | low | high | descr | low | high | +---------+-------+-------+----------------+-------+-------+ | Frank | 21500 | 30000 | Sales Rep | 20000 | 22000 | | Frank | 21500 | 30000 | Window Cleaner | 15000 | 16000 | | Spencer | 10000 | 17000 | Sales Rep | 20000 | 22000 | | Spencer | 10000 | 17000 | Window Cleaner | 15000 | 16000 | +---------+-------+-------+----------------+-------+-------+

Frank will consider the Sales Rep job but not the Window Cleaner job. Spencer's expectations are too low for the Sales Rep job but the Window Cleaner position would be ideal. Filtering out the matches seems complicated because on each row you've got four numbers to compare and six cases to consider.

This situation crops up frequently and is easy to mess up. The trick is to concentrate on the failed matches. There are many ways to succeed in matching, but there are only two ways to fail.

In Figure 5-1, there are only two "deal-breaker" cases in which the query fails to match. In Case 6, the offered range is lower than the minimum requirement.

Figure 5-1. Many overlapping ranges

Just two cases indicate a failure to match. You can see this as an OR in the JOIN condition:

mysql> SELECT * FROM jobSeeker required JOIN jobAd offered -> ON required.high < offered.low OR required.low > offered.high -> ORDER BY name, descr; +---------+-------+-------+----------------+-------+-------+ | name | low | high | descr | low | high | +---------+-------+-------+----------------+-------+-------+ | Frank | 21500 | 30000 | Window Cleaner | 15000 | 16000 | | Spencer | 10000 | 17000 | Sales Rep | 20000 | 22000 | +---------+-------+-------+----------------+-------+-------+

Frank will not consider the job of window cleaner because the low that he requires ($20,000) is more than the high being offered ($16,000).

You can reverse the condition to find the jobs that are suitable:

mysql> SELECT * FROM jobSeeker required JOIN jobAd offered -> ON required.high >= offered.low AND required.low <= offered.high -> ORDER BY name, descr; +---------+-------+-------+----------------+-------+-------+ | name | low | high | descr | low | high | +---------+-------+-------+----------------+-------+-------+ | Frank | 21500 | 30000 | Sales Rep | 20000 | 22000 | | Spencer | 10000 | 17000 | Window Cleaner | 15000 | 16000 | +---------+-------+-------+----------------+-------+-------+

If the unbounded values are represented as NULL you can use COALESCE, which takes in a list, returning the first non-NULL item in the list, or NULL if all the items in the list are NULL. Therefore, you can substitute ridiculously low or high values:

COALESCE(required.high,1E9) >= COALESCE(offered.low,0) AND COALESCE(required.low,0) <= COALESCE(offered.high,1E9))

The value 1E9 is scientific notation. It means one followed by nine zeros, or one billion.

5.4.1. Hacking the Hack

Here's another approach to the range-matching problem. Suppose you are trying to find staff to cover for temporary jobs. You have a job table that includes the start and end dates for each job. You also an availability table that has one row for every continuous period of availability of staff.

It might not be possible to completely cover every job, but you want to be able to see all the possibilities.

Two jobs are available. One job is in Trinidad and is available at the beginning of May, and the other is in Alaska and is available at the end of the month:

mysql> SELECT * FROM job; +-------------+------------+------------+ | id | jobStart | jobEnd | +-------------+------------+------------+ | Trinidad | 2007-05-01 | 2007-05-06 | | Alaska | 2007-05-21 | 2007-05-27 | +-------------+------------+------------+

Frank is available early in May and then again later that same month. Spencer is free from May 19 to the end of May:

mysql> SELECT * FROM availability; +---------+------------+------------+ | worker | availStart | availEnd | +---------+------------+------------+ | Frank | 2007-05-05 | 2007-05-09 | | Frank | 2007-05-16 | 2007-05-26 | | Spencer | 2007-05-19 | 2007-05-31 | +---------+------------+------------+

You can compare every job against all employee availabilities. For each pairing you can calculate the possible assignment start date (the latest of the two start dates) and the possible assignment end date (the earliest of the two end dates). You can display all of this in one query:

mysql> SELECT worker, id, -> GREATEST(jobStart,availStart) AS startDate, -> LEAST(jobEnd,availEnd) AS endDate, -> 1+LEAST(jobEnd,availEnd)-GREATEST(jobStart,availStart) -> AS daysWorked, -> 1+jobEnd-jobStart AS daysRequired -> FROM job CROSS JOIN availability -> WHERE LEAST(jobEnd,availEnd)>=GREATEST(jobStart,availStart); +--------+-----------+------------+------------+------------+--------------+ | worker | id | startDate | endDate | daysWorked | daysRequired | +--------+-----------+------------+------------+------------+--------------+ | Frank | Trinidad | 2007-05-05 | 2007-05-06 | 2 | 6 | | Frank | Alaska | 2007-05-21 | 2007-05-26 | 6 | 7 | | Spencer| Alaska | 2007-05-21 | 2007-05-27 | 7 | 7 | +--------+-----------+------------+------------+------------+--------------+

You can see that only two days of the Trinidad job can be covered. For the Alaska job you can have Frank for six days or Spencer for seven.

The GREATEST and LEAST functions are available in MySQL, Oracle, and PostgreSQL. For other systems you'll find suggestions in "Calculate the Maximum of Two Fields" [Hack #30].

Категории