Java Cookbook, Second Edition

Problem

You want to find text regardless of case.

Solution

Compile the Pattern passing in the flags argument Pattern.CASE_INSENSITIVE to indicate that matching should be case-independent ("fold" or ignore differences in case). If your code might run in different locales (see Chapter 15), add Pattern.UNICODE_CASE. Without these flags, the default is normal, case-sensitive matching behavior. This flag (and others) are passed to the Pattern.compile( ) method, as in:

// CaseMatch.java Pattern reCaseInsens = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE); reCaseInsens.matches(input); // will match case-insensitively

This flag must be passed when you create the Pattern; as Pattern objects are immutable, they cannot be changed once constructed.

The full source code for this example is online as CaseMatch.java.

Pattern.compile( ) Flags

Half a dozen flags can be passed as the second argument to Pattern.compile( ) . If more than one value is needed, they can be or'd together using the | bitwise or operator. In alphabetical order, the flags are:

CANON_EQ

Enables so-called "canonical equivalence," that is, characters are matched by their base character, so that the character e followed by the "combining character mark" for the acute accent ( ´ ) can be matched either by the composite character é or the letter e followed by the character mark for the accent (see Recipe 4.8).

CASE_INSENSITIVE

Turns on case-insensitive matching (see Recipe Recipe 4.7).

COMMENTS

Causes whitespace and comments (from # to end-of-line) to be ignored in the pattern.

DOTALL

Allows dot (.) to match any regular character or the newline, not just newline (see Recipe Recipe 4.9).

MULTILINE

Specifies multiline mode (see Recipe Recipe 4.9).

UNICODE_CASE

Enables Unicode-aware case folding (see Recipe 4.7).

UNIX_LINES

Makes \n the only valid "newline" sequence for MULTILINE mode (see Recipe 4.9).

Категории