Sorting Hostnames in Domain Order

6.15.1 Problem

You want to sort hostnames in domain order, with the rightmost parts of the names more significant than the leftmost parts.

6.15.2 Solution

Break apart the names and sort the pieces from right to left.

6.15.3 Discussion

Hostnames are strings and therefore their natural sort order is lexical. However, it's often desirable to sort hostnames in domain order, where the rightmost segments of the hostname values are more significant than the leftmost segments. Suppose you have a table hostname that contains the following names:

mysql> SELECT name FROM hostname ORDER BY name; +--------------------+ | name | +--------------------+ | cvs.php.net | | dbi.perl.org | | jakarta.apache.org | | lists.mysql.com | | mysql.com | | www.kitebird.com | +--------------------+

The preceding query demonstrates the natural lexical sort order of the name values. That differs from domain order, as shown by the following table:

Lexical order

Domain order

cvs.php.net

www.kitebird.com

dbi.perl.org

mysql.com

jakarta.apache.org

lists.mysql.com

lists.mysql.com

cvs.php.net

mysql.com

jakarta.apache.org

www.kitebird.com

dbi.perl.org

Producing domain-ordered output is a substring-sorting problem, where it's necessary to extract each segment of the names so they can be sorted in right-to-left fashion. There is also an additional complication if your values contain different numbers of segments, as our example hostnames do. (Most of them have three segments, but mysql.com has only two.)

To extract the pieces of the hostnames, begin by using SUBSTRING_INDEX( ) in a manner similar to that described previously in Recipe 6.14. The hostname values have a maximum of three segments, from which the pieces can be extracted left to right like this:

SUBSTRING_INDEX(SUBSTRING_INDEX(name,'.',-3),'.',1) SUBSTRING_INDEX(SUBSTRING_INDEX(name,'.',-2),'.',1) SUBSTRING_INDEX(name,'.',-1)

These expressions work properly as long as all the hostnames have three components. But if a name has fewer than three, we don't get the correct result, as the following query demonstrates:

mysql> SELECT name, -> SUBSTRING_INDEX(SUBSTRING_INDEX(name,'.',-3),'.',1) AS leftmost, -> SUBSTRING_INDEX(SUBSTRING_INDEX(name,'.',-2),'.',1) AS middle, -> SUBSTRING_INDEX(name,'.',-1) AS rightmost -> FROM hostname; +--------------------+----------+----------+-----------+ | name | leftmost | middle | rightmost | +--------------------+----------+----------+-----------+ | cvs.php.net | cvs | php | net | | dbi.perl.org | dbi | perl | org | | lists.mysql.com | lists | mysql | com | | mysql.com | mysql | mysql | com | | jakarta.apache.org | jakarta | apache | org | | www.kitebird.com | www | kitebird | com | +--------------------+----------+----------+-----------+

Notice the output for the mysql.com row; it has mysql for the value of the leftmost column, where it should have an empty string. The segment-extraction expressions work by pulling off the rightmost n segments, then returning the leftmost segment of the result. The source of the problem for mysql.com is that if there aren't n segments, the expression simply returns the leftmost segment of however many there are. To fix this problem, prepend a sufficient number of periods to the hostname values to guarantee that they have the requisite number of segments:

mysql> SELECT name, -> SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT('..',name),'.',-3),'.',1) -> AS leftmost, -> SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT('.',name),'.',-2),'.',1) -> AS middle, -> SUBSTRING_INDEX(name,'.',-1) AS rightmost -> FROM hostname; +--------------------+----------+----------+-----------+ | name | leftmost | middle | rightmost | +--------------------+----------+----------+-----------+ | cvs.php.net | cvs | php | net | | dbi.perl.org | dbi | perl | org | | lists.mysql.com | lists | mysql | com | | mysql.com | | mysql | com | | jakarta.apache.org | jakarta | apache | org | | www.kitebird.com | www | kitebird | com | +--------------------+----------+----------+-----------+

That's pretty ugly. But these expressions do serve to extract the substrings that are needed for sorting hostname values correctly in right-to-left fashion:

mysql> SELECT name FROM hostname -> ORDER BY -> SUBSTRING_INDEX(name,'.',-1), -> SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT('.',name),'.',-2),'.',1), -> SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT('..',name),'.',-3),'.',1); +--------------------+ | name | +--------------------+ | www.kitebird.com | | mysql.com | | lists.mysql.com | | cvs.php.net | | jakarta.apache.org | | dbi.perl.org | +--------------------+

If you had hostnames with a maximum of four segments rather than three, you'd need to add to the ORDER BY clause another SUBSTRING_INDEX( ) expression that prepends three dots to the hostname values.

Категории