Copying Files with Buffers
I'm going to begin with a simple example, copying one file to another file. The basic interface to the program looks like this:
$ java FileCopier original copy
Obviously this program could be written in a traditional way with streams, and that's going to be true of almost all the programs you use the new I/O (NIO) model to write. NIO doesn't make anything possible that was impossible before. However, if the files are large and the local operating system is sophisticated enough, the NIO version of FileCopier might just be faster than the traditional version.
The rough outline of the program is typical:
import java.io.*; import java.nio.*; public class NIOCopier { public static void main(String[] args) throws IOException { FileInputStream inFile = new FileInputStream(args[0]); FileOutputStream outFile = new FileOutputStream(args[1]); // copy files here... inFile.close( ); outFile.close( ); } }
However, rather than merely reading from the input stream and writing to the output stream, I'm going to do something a little different. First, I open channels to both files using the getChannel( ) methods in FileInputStream and FileOutputStream:
FileChannel inChannel = inFile.getChannel( ); FileChannel outChannel = outFile.getChannel( );
Next, I create a one-megabyte buffer with the static factory method ByteBuffer.allocate( ):
ByteBuffer buffer = ByteBuffer.allocate(1024*1024);
The input channel will fill this buffer with data from the original file and the output channel will drain data out of this buffer to store into the copy.
To read data, you pass the buffer to the input channel's read( ) method, much as you'd pass a byte array to an input stream's read( ) method:
inChannel.read(buffer);
The read( ) method returns the number of bytes it read. As with input streams, there's no guarantee that the read( ) method completely fills the buffer. It may read fewer bytes or no bytes at all. When the input data is exhausted, the read( ) method returns -1. Thus, you normally do something like this:
long bytesRead = inChannel.read(buffer); if (bytesRead == -1) break;
Now the output channel needs to write the data in the buffer into the copy. Before it can do that, though, the buffer must be flipped:
buffer.flip( );
Flipping a buffer converts it from input to output.
To write the data, you pass the buffer to the output channel's write( ) method:
outChannel.write(buffer);
However, this is not like an output stream's write(byte[]) method. That method is guaranteed to write every byte in the array to the target or throw an IOException if it can't. The output channel's write( ) method is more like the read( ) method. It will write some bytes, but perhaps not all, and perhaps even none. It returns the number of bytes written. You could loop repeatedly until all the bytes are written, like this:
long bytesWritten = 0; while (bytesWritten < bytesRead){ bytesWritten += outChannel.write(buffer); }
However, there's a simpler way. The buffer object itself knows whether all the data has been written. The hasRemaining( ) method can check this:
while (buffer.hasRemaining( )) outChannel.write(buffer);
This code reads and writes at most one megabyte. To copy larger files, we have to wrap all this up in a loop:
while (true) { ByteBuffer buffer = ByteBuffer.allocate(1024*1024); int bytesRead = inChannel.read(buffer); if (bytesRead == -1) break; buffer.flip( ); while (buffer.hasRemaining( )) outChannel.write(buffer); }
Allocating a new buffer for each read is wasteful and inefficient; we should reuse the same buffer. Before we do that, though, we must restore the buffer to a fresh state by invoking its clear( ) method:
ByteBuffer buffer = ByteBuffer.allocate(1024*1024); while (true) { int bytesRead = inChannel.read(buffer); if (bytesRead == -1) break; buffer.flip( ); while (buffer.hasRemaining( )) outChannel.write(buffer); buffer.clear( ); }
Finally, both the input and output channels should be closed to release any native resources the channel object may be holding onto:
inChannel.close( ); outChannel.close( );
Example 14-1 demonstrates the complete program, after taking a couple of common small shortcuts in the code. Compare this to the equivalent program for copying with streams found in Example 4-2.
Example 14-1. Copying files using NIO
import java.io.*; import java.nio.*; import java.nio.channels.*; public class NIOCopier { public static void main(String[] args) throws IOException { FileInputStream inFile = new FileInputStream(args[0]); FileOutputStream outFile = new FileOutputStream(args[1]); FileChannel inChannel = inFile.getChannel( ); FileChannel outChannel = outFile.getChannel( ); for (ByteBuffer buffer = ByteBuffer.allocate(1024*1024); inChannel.read(buffer) != -1; buffer.clear( )) { buffer.flip( ); while (buffer.hasRemaining( )) outChannel.write(buffer); } inChannel.close( ); outChannel.close( ); } } |
In a very unscientific test, copying one large (4.3-GB) file on one platform (a dual 2.5-GHz PowerMac G5 running Mac OS X 10.4.1) using traditional I/O with buffered streams and an 8192-byte buffer took 305 seconds. Expanding and reducing the buffer size didn't shift the overall numbers more than 5% and if anything tended to increase the time to copy. (Using a one-megabyte buffer like Example 14-1's actually increased the time to over 23 minutes.) Using new I/O as implemented in Example 14-1 was about 16% faster, at 255 seconds. A straight Finder copy took 197 seconds. Using the Unix cp command actually took 312 seconds, so the Finder is doing some surprising optimizations under the hood.
What this suggests is that new I/O doesn't help a great deal for traditional file operations that move through the file from beginning to end. The new I/O API is clearly not a panacea for all I/O performance issues. You can expect to see the biggest improvements in two other areas:
- Network servers that talk to many clients simultaneously
- Repeated random access to parts of large files