Mastering Regular Expressions

8.5. Other Matcher Methods

These remaining matcher methods don't fit into the categories already presented:

Matcher reset()

This method reinitializes most aspects of the matcher, throwing away any information about a previously successful match, resetting the position in the input to the start of its text, and resetting its region (˜ 384) to its "entire text" default. Only the anchoring-bounds and transparent-bounds flags (˜ 388) are left unchanged.

Three matcher methods call reset internally, having the side effect of resetting the region: replaceAll , replaceFirst , and the one-argument form of find .

This method returns the matcher itself, so it can be used with method chaining (˜ 389).

Matcher reset( CharSequence text )

This method resets the matcher just as reset() does, but also changes the target text to the new String (or any object implementing a CharSequence ).

When you want to apply the same regex to multiple chunks of text (for example, to each line while reading a file), it's more efficient to reset with the new text than to create a new matcher.

This method returns the matcher itself, so it can be used with method chaining (˜ 389).

Pattern pattern()

A matcher's pattern method returns the Pattern object associated with the matcher. To see the regular expression itself, use m .pattern().pattern() , which invokes the Pattern object's (identically named, but quite different) pattern method (˜ 394).

Matcher usePattern( Pattern p )

Available since Java 1.5, this method replaces the matcher's associated Pattern object with the one provided. This method does not reset the matcher, thereby allowing you to cycle through different patterns looking for a match starting at the "current position" within the matcher's text. See the discussion starting on page 399 for an example of this in action.

This method returns the matcher itself, so it can be used with method chaining (˜ 389).

String toString()

Also added in Java 1.5, this method returns a string containing some basic information about the matcher, which is useful for debugging. The content and format of the string are subject to change, but as of the Java 1.6 beta release, this snippet:

Matcher m = Pattern.compile("(\w+)").matcher("ABC 123"); System.out.println( m .toString()); m .find(); System.out.println( m .toString());

results in:

java.util.regex.Matcher[pattern=(\w+) region=0,7 lastmatch=] java.util.regex.Matcher[pattern=(\w+) region=0,7 lastmatch=ABC]

Java 1.4.2's Matcher class does have a generic toString method inherited from java.lang.Object , but it returns a less useful string along the lines of ' java.util.regex.Matcher@480457 '.

8.5.1. Querying a matcher's target text

The Matcher class doesn't provide a method to query the current target text, so here's something that attempts to fill that gap:

// This pattern, used in the function below, is compiled and saved here for efficiency . static final Pattern pNeverFail = Pattern.compile("^"); // Return the target text associated with a matcher object . public static String text(Matcher m ) { // Remember these items so that we can restore them later . Integer regionStart = m . regionStart (); Integer regionEnd = m . regionEnd (); Pattern pattern = m . pattern (); // Fetch the string the only way the class allows . String text = m .usePattern(pNeverFail).replaceFirst(""); // Put back what we changed (or might have changed) . m .usePattern( pattern ).region( regionStart , regionEnd ); // Return the text return text ; }

This query uses replaceFirst with a dummy pattern and replacement string to get an unmodified copy of the target text, as a String . In the process, it resets the matcher, but at least takes care to restore the region. It's not a particularly elegant solution (it's not particularly efficient, and always returns a String object even though the matcher's target text might be of a different class), but it will have to suffice until Sun provides a better one.

Категории