Mastering Regular Expressions

8.3. The Pattern.compile() Factory

A regular-expression Pattern object is created with Pattern.compile . The first argument is a string to be interpreted as a regular expression (˜ 101). The compile-time options shown in Table 8-3 on page 368 can be provided as a second argument. Here's a snippet that creates a pattern from the string in the variable myRegex , to be applied in a case-insensitive manner:

Pattern pat = Pattern.compile( myRegex , Pattern.CASE_INSENSITIVE Pattern.UNICODE_CASE);

The predefined pattern constants used to specify the compile options (such as Pattern.CASE_INSENSITIVE ) can be a bit unwieldy, [ ] so I tend to use inline mode modifiers (˜ 110) when practical. Examples include (?x) on page 378, and (?s) and several (?i) on page 399.

[ ] Especially when presenting code in a book with a limited page widthask me, I know!

However, keep in mind that the same verbosity that makes these predefined constants "unwieldy" can make them easier to understand to the novice. Were I to have unlimited page width, I might use

Pattern.UNIX_LINES Pattern.CASE_INSENSITIVE

as the second argument to the Pattern.compile on page 384, rather than prepend the possibly unclear (?id) at the start of the regex.

As the method name implies, this is the step in which a regular expression is analyzed and compiled into an internal form. Chapter 6 looks at this in great detail (˜ 241), but in short, the pattern compilation can often be the most time-consuming step in the whole check-a-string-with-a-regex process. That's why there are separate compile and apply steps in the first placeit allows you to precompile a regex for later use over and over.

Of course, if a regex is going to be compiled and used only once, it doesn't matter when it's compiled, but when applying a regex many times (for example, to each line read from a file), it makes a lot of sense to precompile into a Pattern object.

A call to Pattern.compile can throw two kinds of exceptions: an invalid regular expression throws PatternSyntaxException , and an invalid option value throws IllegalArgumentException .

8.3.1. Pattern's matcher method

Patterns offer some convenience methods that we'll look at in a later section (˜ 394), but for the most part, all the work is done through just one method: matcher . It accepts a single argument: the string to search. [ ] It doesnt actually apply the regex, but prepares the pattern to be applied to a specific string. The matcher method returns a Matcher object.

[ ] Actually, java.util.regex is extremely flexible in that the "string argument can be any object that implements the CharSequence interface (of which String , StringBuilder , StringBuffer , and CharBuffer are examples).

Категории