3.1. Regular Expression Syntax The typical regular expression is delimited by a pair of slashes; the %r form can also be used. Table 3.1, "Basic Regular Expressions," gives some simple examples: Table 3.1. Basic Regular ExpressionsRegex | Explanation |
---|
/Ruby/ | Match the single word Ruby | /[Rr]uby/ | Match Ruby or ruby | /^abc/ | Match an abc at beginning of line | %r(xyz$) | Match an xyz at end of line | %r|[0-9]*| | Match any sequence of (zero or more) digits |
It is also possible to place a modifier, consisting of a single letter, immediately after a regex. Table 3.2 shows the most common modifiers: Table 3.2. Regular Expression ModifiersModifier | Meaning |
---|
i | Ignore case in regex | o | Perform expression substitution only once | m | Multiline mode (dot matches newline) | x | Extended regex (allow whitespace, comments) | Others will be covered in Chapter 4. To complete our introduction to regular expressions, Table 3.3 lists the most common symbols and notations available: Table 3.3. Common Notations Used in Regular ExpressionsNotation | Meaning |
---|
^ | Beginning of a line or string | $ | End of a line or string | . | Any character except newline (unless multiline) | \w | Word character (digit, letter, or underscore) | \W | Non-word character | \s | Whitespace character (space, tab, newline, and so on) | \S | Non-whitespace character | \d | Digit (same as [0-9]) | \D | Non-digit | \A | Beginning of a string | \Z | End of a string or before newline at the end | \z | End of a string | \b | Word boundary (outside [ ] only) | \B | Non-word boundary | \b | Backspace (inside [ ] only) | [] | Any single character of set | * | 0 or more of previous subexpression | *? | 0 or more of previous subexpression (non-greedy) | + | 1 or more of previous subexpression | +? | 1 or more of previous subexpression (non-greedy) | {m,n} | m to n instances of previous subexpression | {m,n}? | m to n instances of previous subexpression (non-greedy) | ? | 0 or 1 of previous regular expression | | | Alternatives | (?= ) | Positive lookahead | (?! ) | Negative lookahead | () | Grouping of subexpressions | (?> ) | Embedded subexpression | (?: ) | Non-capturing group | (?imx-imx) | Turn options on/off henceforth | (?imx-imx:expr) | Turn options on/off for this expression | (?# ) | Comment |
An understanding of regex handling greatly benefits the modern programmer. A complete discussion of this topic is far beyond the scope of this book, but if you're interested see the definitive work Mastering Regular Expressions by Jeffrey Friedl. For additions and extensions to the material in this section, refer to section 3.13, "Ruby and Oniguruma." |