Inside Coldfusion MX

To really make use of regular expressions, you must be familiar with metacharacters and their meanings within the regular expression language. They exist to perform specialized functions in relation to your search criteria. This does not mean that the use of these characters is only for those specialized functions; but if you actually want to search for a character that the regular expression will recognize as a metacharacter, you must escape it. You're already familiar with escaping characters within ColdFusion Markup Language (CFML) code because we often must escape the # character.

Here are the metacharacters with which you'll need to familiarize yourself:

  • Asterisk (*). The asterisk matches zero or more occurrences of the character that immediately precedes it. This means that if you code [a-z]*, the regular expression evaluates true regardless of the string.

  • Backslash (\). The backslash is used to escape metacharacters so that they can be used as literal characters within your search. For example, if you wanted to search for the backslash character itself, you'd have to escape it like this: c:\\.

  • Carat (^). The carat is used to match characters that appear at the beginning of a string. For example, searching for ^Please in a string that starts "Please hand me the…" would return a value of "1" because the regular expression does indeed appear at the beginning of the string. However, you can also use the carat to exclude characters from the match. Using our same example, [^Pl] would return a value of "3" because the first two characters are excluded from the search and the match begins at the third character.

  • Curly brackets ({}). The curly brackets are used to specify a range of occurrences to which the regular expression needs to be matched within the search string. For example, later on you'll see this in a regular expression example: ([a-z]{2,3}). This means that an alphabetic character is to be matched two or three times for the regular expression to evaluate true. By the same principle, {1,10} would signify a range of 1 10. You can also specify a number of occurrences rather than a range by listing only one numeric reference, as in ([a-z]{2}).

  • Dollar sign ($). A dollar sign matches the end-of-line character. For example, the regular expression "rascals$" would match the end of the string "I love to watch the little rascals" but not the string"I love to watch the little rascals." because of the period that follows the word "rascals."

  • Parentheses (()). Parentheses are used to group segments of the regular expression, dividing the regular expression into subexpressions. Using parentheses within your regular expression can extend the functionality of the regular expression and enable you to search for multiple combinations of characters and character classes.

    Parentheses also enable you to take advantage of back references, also known as remembered matches. Back references enable your regular expressions to refer back to subexpressions that have already been matched. The best use for this is when you want to use a matched pattern as part of future searches. A good example of this would be if you want to search a sentence to remove duplicated words. For example, let's say your string to be searched is "One of the best things about regular expressions is". The regular expression that you would use would look like this:

    <cfset NoDupes=REReplaceNoCase(("One of the best things about regular expressions is", "([a-z]+)[]+\1", "\1", "All")>

  • Period (.). A period represents any single character within a string except for the newline character. For example, the regular expression "st.p" would match the strings "step" or "stop", but not "steep".

  • Pipe (|). A pipe enables you to match the character set specified on either side of the pipe, essentially an OR statement. For example, "d|dy" would match "bird" or "birdy".

  • Plus (+). The plus sign represents one or more matches of a regular expression within a string. For example, [a-z]+ will match the @ in the search string "neil@codesweeper.com".

  • Question mark (?). The question mark matches the first occurrence of the preceding pattern or string within your search string. For example, neil? will match "neil@codesweeper.com" or "neilross@codesweeper.com".

Table 10.1 covers the order of precedence for regular expression operators.

Table 10.1. Regular Expressions Order of Precedence

Description

Operator

Bracket symbols

[==] [::] [..]

Escaped characters

\<special character>

Bracket expression

[ ]

Subexpressions/back references

\(\)\n

Single character duplication

*\{a,b\}

Concatenation

_

Anchoring

^$

Категории