Hack 18. Sort Your Email

An email address breaks down into a number of components of interest, such as name and domain. You can sort a list of email contacts using these components.

It can be useful to sort a list of email addresses in more than one way. For example, if you sort on domain name and then by account name, you see your contacts grouped by their organization, as shown in Table 3-2.

Table 3-2. Email ordering

Email by account name Email by domain name
Alan.K.Buccannan@rbs.co.uk napier.ac.uk; i.rankin
complaints@sirius-cybernetics.com napier.ac.uk; P.Bhardwaj
i.rankin@napier.ac.uk rbs.co.uk; Alan.K.Buccannan
P.Bhardwaj@napier.ac.uk rbs.co.uk; Scott.Kemmer
Scott.Kemmer@rbs.co.uk sirius-cybernetics.com; complaints

You can extract the domain name with some string functions:

mysql> SELECT SUBSTRING(e FROM POSITION('@' IN e)+1) AS domain -> , SUBSTRING(e FROM 1 FOR POSITION('@' IN e)-1) AS account -> FROM email -> ORDER BY domain, account; +------------------------+------------------+ | domain | account | +------------------------+------------------+ | napier.ac.uk | i.rankin | | napier.ac.uk | P.Bhardwaj | | rbs.co.uk | Alan.K.Buccannan | | rbs.co.uk | Scott.Kemmer | | sirius-cybernetics.com | complaints | +------------------------+------------------+

For the domain name you want to take the substring starting just past the position of the @ character: POSITION('@' IN e)+1. For the account name you take characters starting at 1 until just before the @ character; the number of characters required is POSITION('@' IN e)-1.

3.4.1. Implementation-Specific Variations

This works for MySQL and for PostgreSQL, both of which implement the standard functions SUBSTRING and POSITION, including the words FROM, IN, and FOR, which are used to separate the parameters. In Oracle and SQL Server, these functions have different names.

3.4.1.1. SQL Server: extract the domain name

With SQL Server, you extract the domain name like this:

SELECT SUBSTRING(e,1+CHARINDEX('@',e),50) AS domain ,SUBSTRING(e,1,CHARINDEX('@',e)-1) AS account FROM email ORDER BY domain, account

 

3.4.1.2. Oracle: extract the domain name

Extract the domain name in Oracle like this:

SELECT SUBSTR(e,1+INSTR('@',e)) AS domain ,SUBSTR(e,1,INSTR('@',e)-1) AS account FROM email ORDER BY domain, account;

 

3.4.2. Extract the Top-Level Domain

Sometimes you want to extract the last part of a string. Suppose you needed to get the top-level domain from an email address. That is the string following the final dot in the address. The position function will normally return the first occurrence of a character, which is fine for splitting an email address on @, but is not good for finding the substring after the last dot. The standard SQL string functions are not sufficient for this calculation, so a different approach is required for each system.

3.4.2.1. MySQL: extract the top-level domain

You can use the REVERSE function to determine how far the dot is from the end of the string. The RIGHT function then returns the substring required:

mysql> SELECT RIGHT(e,POSITION('.' IN REVERSE(e))-1),e -> FROM email; +----------------------------------------+---------------------------------+ | RIGHT(e,POSITION('.' IN REVERSE(e))-1) | e | +----------------------------------------+---------------------------------+ | uk | I.Rankin@napier.ac.uk | | uk | P.Bhardwaj@napier.ac.uk | | uk | Scott.Kemmer@rbs.co.uk | | uk | Alan.K.Buccannan@rbs.co.uk | | com | Complaints@sirius-cybernetics.co| +----------------------------------------+---------------------------------+

 

3.4.2.2. SQL Server: extract the top-level domain

You can use REVERSE in SQL Server:

SELECT RIGHT(e,CHARINDEX('.', REVERSE(e))-1),e FROM email

 

3.4.2.3. Oracle: extract the top-level domain

In Oracle, you can give a negative number to the INSTR function to make it search from the rightmost end:

SELECT SUBSTR(e,1+INSTR(e,'.',-1)),e FROM email;

 

3.4.2.4. PostgreSQL: extract the top-level domain

You can specify a pattern in the SUBSTRING function. The double quotes indicate the location of the substring to be returned. The # escapes the quotes:

SELECT SUBSTRING(e FROM '%.#"%#"' FOR '#'), e FROM email;

 

3.4.3. Hacking the Hack

You can create an index on a calculated field like this in Oracle, SQL Server, and Postgres (but not MySQL 5.0). For each system there are restrictions, primarily that the calculation must be deterministic (that is, it must return the same value for the same parameters; unlike Random( ) or Getdate( )).

3.4.3.1. SQL Server: calculated index

The best way to index on a calculation is to put that calculation into a view. You can create an index on a calculated column of a view as long as the view uses schema binding and the column is based on deterministic functions. The schema-binding restriction enforces some sensible policies and ensures that the view refers only to existing entities. You must have the options set as shown before you can use schema binding. Also, you must create a unique clustered index before you create the required index:

1> DROP VIEW esort 2> GO 1> SET ANSI_NULLS ON 2> SET ANSI_PADDING ON 3> SET ANSI_WARNINGS ON 4> SET ARITHABORT ON 5> SET CONCAT_NULL_YIELDS_NULL ON 6> SET QUOTED_IDENTIFIER ON 7> SET NUMERIC_ROUNDABORT OFF 8> GO 1> CREATE VIEW esort WITH SCHEMABINDING AS 2> SELECT e, 3> SUBSTRING(e,1+CHARINDEX('@',e),50) AS domain, 4> SUBSTRING(e,1,CHARINDEX('@',e)) AS account 5> FROM dbo.email 6> GO 1> CREATE UNIQUE CLUSTERED INDEX ucie ON esort(e) 2> GO 1> CREATE INDEX esortidx ON esort(domain) 2> GO

With schema binding on, you must prefix the table name with the owner. In this example, we used dbo in dbo.email, which is an alias for the database owner.

3.4.3.2. Oracle: calculated index

Oracle allows you to define an index on a deterministic expression:

CREATE INDEX esortidx ON email(SUBSTR(e,1+INSTR('@',e)))

 

3.4.3.3. PostgreSQL: calculated index

PostgreSQL restricts a functional index to functions with only one parameter. Your required expression does not take that form. Fortunately, you can get around this restriction by defining your own function. It is called fDomain in this example:

scott=> CREATE FUNCTION fDomain(VARCHAR(50)) RETURNS VARCHAR(50) scott-> AS 'SELECT SUBSTRING($1 ,1+POSITION('@' IN $1 ),50);' scott-> LANGUAGE SQL IMMUTABLE; CREATE FUNCTION scott=> CREATE INDEX domainidx ON email(fDomain(e)); CREATE INDEX

Категории