Appendix H Regular Expressions
A regular expression is a pattern of text that consists of ordinary characters (such as letters a through z) and special characters that are known as metacharacters . The pattern is used to describe one or more strings to match when searching a body of text. The regular expression acts as a template for matching a character pattern to the string that is being searched for.
The following table contains the complete list of metacharacters and their behavior in the context of a regular expression.
Character | Description |
Marks the next character as either a special character or a literal |
|
^ |
Matches the beginning of input |
$ |
Matches the end of input |
* |
Matches the preceding character zero or more times |
+ |
Matches the preceding character one or more times |
? |
Matches the preceding character zero or one time |
. |
Matches any single character except a newline character |
(pattern) |
Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using Item [0]...[n] To match the parentheses characters themselves , precede with slash-use ³ ³ or ³ ( ³) |
(?:pattern) |
Matches pattern but does not capture the match, that is, it is a noncapturing match that is not stored for possible later use. This is useful for combining parts of a pattern with the ³or ³ character (!). For example, ² anomol (?:y ! ies) ² is a more economical expression than ²anomoly ! anomolies ² |
(?=pattern) |
Positive lookahead matches the search string at any point where a string matching pattern begins. This is a noncapturing match, that is, the match is not captured for possible later use. For example ²Windows (?= 95 98 NT 2000 XP) ² matches ³Windows ³ in ³Windows XP ³ but not ³Windows ³ in ³Windows 3.1 ³ |
(?!pattern) |
Negative lookahead matches the search string at any point where a string not matching pattern begins. This is a noncapturing match, that is, the match is not captured for possible later use. For example, ³Windows ( ? ! 95 98 NT 2000 XP ) ³ matches ³Windows ³ in ³Windows 3.1 ³ but does not match ³Windows ³ in ³Windows XP ³ |
xy |
Matches either x or y |
{n } |
Matches exactly n times n must always be a nonnegative integer) |
{n, } |
Matches at least n times ( n must always be a nonnegative integer-note the terminating comma) |
{n,m} |
Matches at least n and at most m times ( m and n must always be nonnegative integers) |
[xyz] |
Matches any one of the enclosed characters ( xyz represents a character set) |
[^xyz] |
Matches any character not enclosed ( ^xyz represents a negative character set) |
[a-z] |
Matches any character in the specified range ( a-z represents a range of characters) |
[^m-z] |
Matches any character not in the specified range ( ^m-z represents a negative range of characters) |
Matches a word boundary, that is, the position between a word and a space |
|
B |
Matches a nonword boundary |
d |
Matches a digit character. Equivalent to [0 - 9] |
D |
Matches a nondigit character. Equivalent to [^ 0-9] |
f |
Matches a form-feed character |
|
Matches a newline character |
|
Matches a carriage return character |
s |
Matches any white space including space, tab, form-feed, and so on. Equivalent to ' [f v]" |
|
Matches a tab character ' [^f v] |
v |
Matches a vertical tab character |
w |
Matches any word character including underscore . Equivalent to ³ [A-Za-z0-9_] ³ |
W |
Matches any nonword character. Equivalent to ³ [^A-Za-z0-9_] ³ |
. |
Matches . |
Matches |
|
{ |
Matches { |
} |
Matches } |
\ |
Matches |
[ |
Matches [ |
] |
Matches ] |
( |
Matches ( |
) |
Matches ) |
$ num |
Matches num, where num is a positive integer. A reference back to remembered matches (note the $ symbol- differs from some Microsoft documentation) |
|
Matches n , where n is an octal escape value. Octal escape values must be 1, 2, or 3 digits long |
uxxxx |
Matches the ASCII character expressed by the UNICODE xxxx |
xn |
Matches n , where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long |