Class StringTokenizer

When you read a sentence, your mind breaks the sentence into tokensindividual words and punctuation marks, each of which conveys meaning to you. Compilers also perform tokenization. They break up statements into individual pieces like keywords, identifiers, operators and other elements of a programming language. In this section, we study Java's StringTokenizer class (from package java.util), which breaks a string into its component tokens. Tokens are separated from one another by delimiters, typically whitespace characters such as space, tab, newline and carriage return. Other characters can also be used as delimiters to separate tokens. The application in Fig. 29.18 demonstrates class StringTokenizer.

Figure 29.18. StringTokenizer object used to tokenize strings.

(This item is displayed on page 1378 in the print version)

1 // Fig. 29.18: TokenTest.java 2 // StringTokenizer class. 3 import java.util.Scanner; 4 import java.util.StringTokenizer; 5 6 public class TokenTest 7 { 8 // execute application 9 public static void main( String args[] ) 10 { 11 // get sentence 12 Scanner scanner = new Scanner( System.in ); 13 System.out.println( "Enter a sentence and press Enter" ); 14 String sentence = scanner.nextLine(); 15 16 // process user sentence 17 StringTokenizer tokens = new StringTokenizer( sentence ); 18 System.out.printf( "Number of elements: %d The tokens are: ", 19 tokens.countTokens() ); 20 21 while ( tokens.hasMoreTokens() ) 22 System.out.println( tokens.nextToken() ); 23 } // end main 24 } // end class TokenTest  

Enter a sentence and press Enter This is a sentence with seven tokens Number of elements: 7 The tokens are: This is a sentence with seven tokens  

When the user presses the Enter key, the input sentence is stored in String variable sentence. Line 17 creates an instance of class StringTokenizer using String sentence. This StringTokenizer constructor takes a string argument and creates a StringTokenizer for it, and will use the default delimiter string " f" consisting of a space, a tab, a carriage return and a newline for tokenization. There are two other constructors for class StringTokenizer. In the version that takes two String arguments, the second String is the delimiter string. In the version that takes three arguments, the second String is the delimiter string and the third argument (a boolean) determines whether the delimiters are also returned as tokens (only if the argument is TRue). This is useful if you need to know what the delimiters are.

Line 19 uses StringTokenizer method countTokens to determine the number of tokens in the string to be tokenized. The condition in the while statement at lines 2122 uses StringTokenizer method hasMoreTokens to determine whether there are more tokens in the string being tokenized. If so, line 22 prints the next token in the String. The next token is obtained with a call to StringTokenizer method nextToken, which returns a String. The token is output using println, so subsequent tokens appear on separate lines.

If you would like to change the delimiter string while tokenizing a string, you may do so by specifying a new delimiter string in a nextToken call as follows:

tokens.nextToken( newDelimiterString );

This feature is not demonstrated in Fig. 29.18.

Категории