Interacting with Files
The built-in java.io package contains dozens of classes for reading and writing data in memory, on disk, and over networks. These relatively simple components can be strung together in a huge variety of ways. The intent of this design is to give experienced programmers a good deal of control over input and output. Unfortunately, the vast collection of classes is confusing and intimidating to the new programmer. In this section, we will concentrate on a few particularly useful classes, demonstrating how to read and write text and data.
Text Files
We begin with the simple task of writing text to a file. We can do this using the java.io.PrintWriter class (Figure 17-1).
Figure 17-1. This program almost works.
1 import java.io.*; 2 3 /** Class to demonstrate text file output. */ 4 public class Ozymandias { 5 6 /** Print a string to a file. */ 7 public static void main(String[] args) throws IOException { 8 PrintWriter out = new PrintWriter("ozymandias.txt"); 9 out.println("Look on my works, ye mighty, and despair!"); 10 } 11 12 } |
The PrintWriter class has a constructor which accepts a file name as a String. Invoking this constructor might throw an IOException (specifically, a FileNotFoundException) if the file in question does not exist and cannot be created. There's nothing our program can do if such an exception occurs, so we pass it on by declaring that main() might throw an IOException.
Once we have a PrintWriter, we can invoke the methods print() and println(), which work just like the ones provided by System.out.
When we run this program, it almost works. It creates a file called ozymandias.txt in the current directory (the directory from which we ran the program). If we examine the file, however, we discover that it is empty! What happened?
The problem is that the PrintWriter is buffered. In other words, it doesn't write to the file every time we invoke println(). Instead, it saves up its output until it has a large amount, then writes all of it to the disk. This is a reasonable thing to do, because accessing the disk is an extremely time-consuming operation. Because a disk involves physical moving parts, a disk access can take roughly a million times as long as a memory access. On the other hand, once the disk's read/write head is in the right place, reading or writing more data at the same place is practically free. Thus, it is more efficient to make a few large disk accesses than many small ones.
In order to get our program to work properly, we must flush the bufferthat is, tell the PrintWriter, "You can really send that stuff to the disk now." We could do this by invoking the flush() method, but since we're done with the file, we might as well close the file completely (Figure 17-2).
Figure 17-2. Closing the PrintWriter flushes the buffer.
1 /** Print a string to a file. */ 2 public static void main(String[] args) throws IOException { 3 PrintWriter out = new PrintWriter("ozymandias.txt"); 4 out.println("Look on my works, ye mighty, and despair!"); 5 out.close(); 6 } |
A more elaborate way to write this program would be to explicitly create an instance of the File class, then attach a PrintWriter to that file (Figure 17-3). The File object handles interaction with the disk, while the PrintWriter object provides the print() and println() methods.
Figure 17-3. Explicitly creating a File object.
1 /** Print a string to a file. */ 2 public static void main(String[] args) throws IOException { 3 File file = new File("ozymandias.txt"); 4 PrintWriter out = new PrintWriter(file); 5 out.println("Look on my works, ye mighty, and despair!"); 6 out.close(); 7 } |
There is no particular reason to use this more complicated version here, because PrintWriter has a constructor that lets us specify the file name directly. This example does demonstrate the philosophy of the java.io package: provide many simple components which can be combined to produce the desired behavior. Later, we'll attach a different "filter" to a File to write something other than text.
Before moving on, let's look at some of the classes involved in what we've done so far (Figure 17-4).
Figure 17-4. Classes involved in writing text to a file. The shaded classes have not been discussed previously.
(This item is displayed on page 470 in the print version)
A PrintWriter both is a Writer and contains one. Specifically, when we hook one up to a File, it contains a FileWriter (Figure 17-5). Fortunately, we don't have to keep track of all of these intermediate objects; the constructors in PrintWriter do the work for us.
Figure 17-5. A PrintWriter can contain a FileWriter, which in turn contains a File. Each object in this chain offers some additional functionality.
(This item is displayed on page 470 in the print version)
The object System.out, incidentally, is an instance of the class PrintStream.
We would also like to read text in from files. Oddly enough, there is no such thing as a Print-Reader. Instead, we can use the java.util.Scanner class. As we have seen, this class has several constructors. One constructor takes an InputStream, such as System.in. Another, used in the Pick Up Sticks program in Figure 15-29, takes a String. Yet another takes a File.
As an example, let's write a program that reads a Java program and prints the lines containing the substring "public" (Figure 17-6).
Figure 17-6. This program prints all lines of a file containing the substring "public".
1 import java.io.*; 2 import java.util.Scanner; 3 4 /** Print all lines containing the substring "public". */ 5 public class PrintPublicMembers { 6 7 /** Run on the file specified as args[0]. */ 8 public static void main(String[] args) throws IOException { 9 File file = new File(args[0]); 10 Scanner in = new Scanner(file); 11 while (in.hasNextLine()) { 12 String line = in.nextLine(); 13 if (line.contains("public")) { 14 System.out.println(line); 15 } 16 } 17 } 18 19 } |
Running this program allows us to see all of the public methods of a class. For example, if we run our program with the command
java PrintPublicMembers ArrayList.java
we get:
public class ArrayList implements List { public ArrayList() { public void add(E target) { public boolean contains(E target) { public boolean isEmpty() { public java.util.Iterator iterator() { public E get(int index) { public E remove(int index) { public boolean remove(E target) { public void set(int index, E target) { public int size() { public String toString() {
Incidentally, this program touches on one of the deepest ideas in computer science: it is possible to write programs that treat other programs as data. We have been dealing with one such program throughout this book: the Java compiler. It reads a program (as Java source code) from a file and writes another program (as compiled Java byte code) to another file.
As with text output, there's more going on behind the scenes than is apparent in the code we've written (Figure 17-7).
Figure 17-7. Classes and interfaces involved in reading text from a file. The shaded classes have not been discussed previously.
(This item is displayed on page 472 in the print version)
A Scanner is associated with an instance of some class implementing the Readable interface. Different things happen, depending on which Scanner constructor we use:
- If the argument is a String, the Scanner contains a StringReader, which in turn contains the String.
- If the argument is a File, the Scanner contains a FileReader, which in turn contains that File.
- If the argument is a BufferedInputStream (such as System.in), the Scanner contains an InputStreamReader, which in turn contains that BufferedInputStream.
Once again, we are thankful that we don't have to keep track of all of this!
Data Files
Information in files need not be stored in text. If it is not necessary for humans to read files directly, text is somewhat inefficient. For example, suppose we want to store nine-digit Social Security numbers. As text, each digit is a character, occupying one byte in the ASCII encoding. (Java actually uses the more comprehensive Unicode encoding. The way Unicode characters are represented on disk is complicated; we ignore these details.) A nine-digit number can also be stored in a four-byte int, using less than half the space. If hundreds of thousands of such numbers are being stored, this can be a significant savings.
To interact with data stored in this binary format, we use the classes ObjectInputStream and ObjectOutputStream. These are subclasses of InputStream and OutputStream, respectively (Figure 17-8).
Figure 17-8. Classes involved in reading from and writing to a binary data file.
(This item is displayed on page 473 in the print version)
As a simple example, the program in Figure 17-9 takes an optional command-line argument. If an argument is provided, it is stored in a file (lines 1316). Otherwise, the first int in the file is read and printed to the screen (lines 1820).
Figure 17-9. Reading and writing data in binary format.
(This item is displayed on page 473 in the print version)
1 import java.io.*; 2 3 /** Example of storing data in binary format. */ 4 public class DataFileExample { 5 6 /** 7 * If an int is provided on the command line, store it in 8 * number.data. Otherwise, read an int from number.data 9 * and print it. 10 */ 11 public static void main(String[] args) throws IOException { 12 File file = new File("number.data"); 13 if (args.length > 0) { 14 ObjectOutputStream out 15 = new ObjectOutputStream(new FileOutputStream(file)); 16 out.writeInt (Integer.parseInt(args[0])); 17 out.close(); 18 } else { 19 ObjectInputStream in 20 = new ObjectInputStream(new FileInputStream(file)); 21 System.out.println(in.readInt()); 22 } 23 } 24 25 } |
In addition to readInt() and writeInt(), there are corresponding methods for all of the other primitive types.
There are also methods readObject() and writeObject(). When we write an object to an ObjectOutputStream, Java automatically writes the contents of the object's fields. If these are references to other objects, those objects are written as well. The observant reader, recalling Chapter 16, will see a problem here. We need to write all of the objects reachable from the original root object. Each object must be written exactly once: we don't want to miss any, and we don't want to write two copies of an object just because there are two references to it.
Handily, this is exactly the problem solved by the copying garbage collector described in Section 16.2. Java uses this algorithm to convert a directed graph of objects into a linear file, a process called serialization. All we have to do is make every object we are saving serializable. We do this by implementing the java.io.Serializable interface. This interface has no methods; we merely have to state that we are implementing it. Many built-in classes, including String and all of the wrapper classes, are Serializable.
As an example, recall the game of Questions from Section 10.1. A major drawback of the program was that, when we quit the program, the decision tree was lost. It would be much better to store the tree in a file. We can accomplish this by adding Serializable to the list of interfaces implemented by the BinaryNode and Questions classes, and by updating the main() method for Questions, as shown in Figure 17-10.
Figure 17-10. Once BinaryNode and Questions implement Serializable, we can store the decision tree in a file. Questions must also import java.io.*.
1 /** Create and repeatedly play the game. */ 2 public static void main(String[] args) throws IOException { 3 Questions game = new Questions(); 4 System.out.println("Welcome to Questions."); 5 do { 6 System.out.println(); 7 game.play(); 8 System.out.print("Play again (yes or no)? "); 9 } while (INPUT.nextLine().equals("yes")); 10 // Save knowledge to a file 11 ObjectOutputStream out 12 = new ObjectOutputStream(new FileOutputStream("questions.data")); 13 out.writeObject(game); 14 out.close(); 15 } |
Running this program once produces a file questions.data, but we have no way to read it. We have to modify the main() method so that, when it starts, we read the tree from the file (Figure 17-11).
The program now works properly, but there's something fishy going on. The original decision tree, containing just the leaf node "a giraffe", does not appear anywhere in the program. That knowledge exists only in the data file. If we were ever to lose the data file, we would have no way to start over.
We want to start over only if the file does not already exist. The next subsection discusses how to determine this.
Figure 17-11. Reading the Questions data from a file.
1 /** Create and repeatedly play the game. */ 2 public static void main(String[] args) 3 throws ClassNotFoundException, IOException { 4 // Read knowledge from a file 5 ObjectInputStream in 6 = new ObjectInputStream(new FileInputStream("questions.data")); 7 Questions game = (Questions)(in.readObject()); 8 // Play the game 9 System.out.println("Welcome to Questions."); 10 do { 11 System.out.println(); 12 game.play(); 13 System.out.print("Play again (yes or no)? "); 14 } while (INPUT.nextLine().equals("yes")); 15 // Save knowledge to a file 16 ObjectOutputStream out 17 = new ObjectOutputStream(new FileOutputStream 18 ("questions.data")); 19 out.writeObject(game); 20 out.close(); 21 } |
Directories
We begin with the anticlimactic answer to the question posed in the previous paragraph. To determine whether a file exists, we use the exists() method of the corresponding File object (Figure 17-12).
Figure 17-12. Improved version of the main() method from the Questions class. If there is no data file to read from, it creates a new instance of Questions.
1 /** Create and repeatedly play the game. */ 2 public static void main(String[] args) 3 throws ClassNotFoundException, IOException { 4 // Read knowledge from a file 5 Questions game; 6 File file = new File("questions.data"); 7 if (file.exists()) { 8 ObjectInputStream in 9 = new ObjectInputStream(new FileInputStream(file)); 10 game = (Questions)(in.readObject()); 11 } else { 12 game = new Questions(); 13 } 14 // Play the game 15 System.out.println("Welcome to Questions."); 16 do { 17 System.out.println(); 18 game.play(); 19 System.out.print("Play again (yes or no)? "); 20 } while (INPUT.nextLine().equals("yes")); 21 // Save knowledge to a file 22 ObjectOutputStream out 23 = new ObjectOutputStream(new FileOutputStream(file)); 24 out.writeObject(game); 25 out.close(); 26 } |
The exists() method seems strange. Doesn't every object exist? Yes, the object file, which is an instance of class File, does exist. It contains the name of a hypothetical file on disk. The question answered by exists() is whether there really is a file with that name.
We don't have to put all files in the current directory. A File object can specify the entire path of a file. For example, if the current directory has a subdirectory lib, then we could have a file corresponding to lib/questions.data. If we simply replace the name of the file in line 6 in Figure 17-12, however, we'll have a couple of problems.
One problem is that not all operating systems use "/" to separate directories in a path. Specifically, Windows uses "" instead. For platform independence, instead of
"lib/questions.data"
we should use:
"lib" + File.separator + "questions.data"
A second problem is that, if the lib directory doesn't exist, opening lib/questions.data for output won't create it. A directory must be explicitly created using the mkdir() method. In this case:
new File("lib").mkdir();
A third problem is that lib is a subdirectory of the current working directory. It would be better to use an absolute path, so that the location of the data file is independent of the directory from which Questions is invoked. This is particularly important in modern graphic development environments and operating systems. If a program is invoked by selecting it from a menu or clicking on an icon, it may not be clear what the current working directory is.
We can find the name of the directory containing the Questions program with the following arcane incantation, which we make no attempt to explain:
Questions.class.getProtectionDomain().getCodeSource() .getLocation().getFile();
The final version of main() is given in Figure 17-13.
Figure 17-13. The file questions.data now lives in the lib subdirectory of the directory containing the Questions program.
1 /** Create and repeatedly play the game. */ 2 public static void main(String[] args) 3 throws ClassNotFoundException, IOException { 4 // Read knowledge from a file 5 Questions game; 6 String home = Questions.class.getProtectionDomain().getCodeSource() 7 .getLocation().getFile() + File.separator(); 8 File file = new File(home + "lib" 9 + File.separator + "questions.data"); 10 if (file.exists()) { 11 ObjectInputStream in 12 = new ObjectInputStream(new FileInputStream(file)); 13 game = (Questions)(in.readObject()); 14 } else { 15 game = new Questions(); 16 new File(home + "lib").mkdir(); 17 } 18 // See lines 1425 of Figure 17-12 19 } |
Exercises
17.1 |
What is printed if we run the PrintPublicMembers program on itself? |
17.2 |
Write a program that prints its own source code. |
17.3 |
Explain why, in Figure 17-6, we couldn't replace lines 910 with: Scanner in = new Scanner(args[0]); |
17.4 |
Look up the API for the File class. How can we determine if a file refers to a directory? |
Compression
|