Core Java(TM) 2, Volume I--Fundamentals (7th Edition) (Core Series) (Core Series)
Conceptually, Java strings are sequences of Unicode characters. For example, the string "Java\u2122" consists of the five Unicode characters J, a, v, a, and ™. Java does not have a built-in string type. Instead, the standard Java library contains a predefined class called, naturally enough, String. Each quoted string is an instance of the String class: String e = ""; // an empty string String greeting = "Hello";
Code Points and Code Units
Java strings are implemented as sequences of char values. As we discussed on page 41, the char data type is a code unit for representing Unicode code points in the UTF-16 encoding. The most commonly used Unicode characters can be represented with a single code unit. The supplementary characters require a pair of code units. The length method yields the number of code units required for a given string in the UTF-16 encoding. For example: String greeting = "Hello"; int n = greeting.length(); // is 5.
To get the true length, that is, the number of code points, call int cpCount = greeting.codePointCount(0, greeting.length());
The call s.charAt(n) returns the code unit at position n, where n is between 0 and s.length() 1. For example, char first = greeting.charAt(0); // first is 'H' char last = greeting.charAt(4); // last is 'o' To get at the ith code point, use the statements int index = greeting.offsetByCodePoints(0, i); int cp = greeting.codePointAt(index);
NOTE
Why are we making a fuss about code units? Consider the sentence is the set of integers
The character requires two code units in the UTF-16 encoding. Calling char ch = sentence.charAt(1) doesn't return a space but the second code unit of . To avoid this problem, you should not use the char type. It is too low-level. If your code traverses a string, and you want to look at each code point in turn, use these statements: int cp = sentence.codePointAt(i); if (Character.isSupplementaryCodePoint(cp)) i += 2; else i++; Fortunately, the codePointAt method can tell whether a code unit is the first or second half of a supplementary character, and it returns the right result either way. That is, you can move backwards with the following statements: i--; int cp = sentence.codePointAt(i); if (Character.isSupplementaryCodePoint(cp)) i--;
Substrings
You extract a substring from a larger string with the substring method of the String class. For example, String greeting = "Hello"; String s = greeting.substring(0, 3); creates a string consisting of the characters "Hel". The second parameter of substring is the first code unit that you do not want to copy. In our case, we want to copy the code units in positions 0, 1, and 2 (from position 0 to position 2 inclusive). As substring counts it, this means from position 0 inclusive to position 3 exclusive. There is one advantage to the way substring works: Computing the number of code units in the substring is easy. The string s.substring(a, b) always has b - a code units. For example, the substring "Hel" has 3 0 = 3 code units. String Editing
The String class gives no methods that let you change a character in an existing string. If you want to turn greeting into "Help!", you cannot directly change the last positions of greeting into 'p' and '!'. If you are a C programmer, this will make you feel pretty helpless. How are you going to modify the string? In Java, it is quite easy: concatenate the substring that you want to keep with the characters that you want to replace. greeting = greeting.substring(0, 3) + "p!";
This declaration changes the current value of the greeting variable to "Help!". Because you cannot change the individual characters in a Java string, the documentation refers to the objects of the String class as being immutable. Just as the number 3 is always 3, the string "Hello" will always contain the code unit sequence describing the characters H, e, l, l, o. You cannot change these values. You can, as you just saw however, change the contents of the string variable greeting and make it refer to a different string, just as you can make a numeric variable currently holding the value 3 hold the value 4. Isn't that a lot less efficient? It would seem simpler to change the code units than to build up a whole new string from scratch. Well, yes and no. Indeed, it isn't efficient to generate a new string that holds the concatenation of "Hel" and "p!". But immutable strings have one great advantage: the compiler can arrange that strings are shared. To understand how this works, think of the various strings as sitting in a common pool. String variables then point to locations in the pool. If you copy a string variable, both the original and the copy share the same characters. Overall, the designers of Java decided that the efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating. Look at your own programs; we suspect that most of the time, you don't change strings you just compare them. Of course, in some cases, direct manipulation of strings is more efficient. (One example is assembling strings from individual characters that come from a file or the keyboard.) For these situations, Java provides a separate StringBuilder class that we describe in Chapter 12. If you are not concerned with the efficiency of string handling, you can ignore StringBuilder and just use String. C++ NOTE
Concatenation
Java, like most programming languages, allows you to use the + sign to join (concatenate) two strings. String expletive = "Expletive"; String PG13 = "deleted"; String message = expletive + PG13;
The above code sets the variable message to the string "Expletivedeleted". (Note the lack of a space between the words: the + sign joins two strings in the order received, exactly as they are given.) When you concatenate a string with a value that is not a string, the latter is converted to a string. (As you see in Chapter 5, every Java object can be converted to a string.) For example: int age = 13; String rating = "PG" + age; sets rating to the string "PG13". This feature is commonly used in output statements. For example, System.out.println("The answer is " + answer); is perfectly acceptable and will print what one would want (and with the correct spacing because of the space after the word is). Testing Strings for Equality
To test whether two strings are equal, use the equals method. The expression s.equals(t)
returns TRue if the strings s and t are equal, false otherwise. Note that s and t can be string variables or string constants. For example, the expression "Hello".equals(greeting)
is perfectly legal. To test whether two strings are identical except for the upper/lowercase letter distinction, use the equalsIgnoreCase method. "Hello".equalsIgnoreCase("hello")
Do not use the == operator to test whether two strings are equal! It only determines whether or not the strings are stored in the same location. Sure, if strings are in the same location, they must be equal. But it is entirely possible to store multiple copies of identical strings in different places. String greeting = "Hello"; //initialize greeting to a string if (greeting == "Hello") . . . // probably true if (greeting.substring(0, 3) == "Hel") . . . // probably false If the virtual machine would always arrange for equal strings to be shared, then you could use the == operator for testing equality. But only string constants are shared, not strings that are the result of operations like + or substring. Therefore, never use == to compare strings lest you end up with a program with the worst kind of bug an intermittent one that seems to occur randomly. C++ NOTE
The String class in Java contains more than 50 methods. A surprisingly large number of them are sufficiently useful so that we can imagine using them frequently. The following API note summarizes the ones we found most useful. NOTE
java.lang.String 1.0
Reading the On-Line API Documentation
As you just saw, the String class has lots of methods. Furthermore, there are thousands of classes in the standard libraries, with many more methods. It is plainly impossible to remember all useful classes and methods. Therefore, it is essential that you become familiar with the on-line API documentation that lets you look up all classes and methods in the standard library. The API documentation is part of the JDK. It is in HTML format. Point your web browser to the docs/api/index.html subdirectory of your JDK installation. You will see a screen like that in Figure 3-2. Figure 3-2. The three panes of the API documentation
The screen is organized into three frames. A small frame on the top left shows all available packages. Below it, a larger frame lists all classes. Click on any class name, and the API documentation for the class is displayed in the large frame to the right (see Figure 3-3). For example, to get more information on the methods of the String class, scroll the second frame until you see the String link, then click on it. Figure 3-3. Class description for the String class
Then scroll the frame on the right until you reach a summary of all methods, sorted in alphabetical order (see Figure 3-4). Click on any method name for a detailed description of that method (see Figure 3-5). For example, if you click on the compareToIgnoreCase link, you get the description of the compareToIgnoreCase method. Figure 3-4. Method summary of the String class
Figure 3-5. Detailed description of a String method
TIP
|