Buffers and Channels

Streams are reasonably fast as long as an application has to read from or write to only one at a time. In fact, the bottleneck is more likely to be the disk or network you're reading from or writing to than the Java program itself. The situation is a little dicier when a program needs to read from or write to many different streams simultaneously. This is a common situation in web servers, for example, where a single process may be communicating with hundreds or even thousands of different clients simultaneously.

At any given time, a stream may block. That is, it may simply stop accepting further requests temporarily while it waits for the actual hardware it's writing to or reading from to catch up. This can happen on disks, and it's a major issue on network connections. Clearly, you don't want to stop sending data to 999 clients just because one of them is experiencing network congestion. The traditional solution to this problem prior to Java 1.4 was to put each connection in a separate thread. Five hundred clients requires 500 threads. Each thread can run independently of the others so that one slow connection doesn't slow down everyone.

However, threads are not without overhead of their own. Creating and managing threads takes a lot of work, and few virtual machines can handle more than a thousand or so threads without serious performance degradation. Spawning several thousand threads can crash even the toughest virtual machine. Nonetheless, big servers need to be able to communicate with thousands of clients simultaneously.

The solution invented in Java 1.4 was nonblocking I/O. In nonblocking I/O, streams are relegated mostly to a supporting role while the real work is done by channels and buffers. Input buffers are filled with data from the channel and then drained of data by the application. Output buffers work in reverse: the application fills them with data that is subsequently drained out by the target. The design is such that the writer and reader don't always have to operate in lockstep with each other. Most importantly, the client application can queue reads and writes to each channel. It does not have to stop processing simply because the other end of the channel isn't quite ready. This enables one thread to service many different channels simultaneously, dramatically reducing the load on the virtual machine.

Channels and buffers are also used to enable memory-mapped I/O. In memory-mapped I/O, files are treated as large blocks of memory, essentially as big byte arrays. Particular parts of a mapped file can be read with statements such as int x = file.getInt(1067) and written with statements such as file.putInt(x, 1067). The data is stored directly to disk at the right location without having to read or write all the data that precedes or follows the section of interest.

Channels and buffers are a little more complex than streams and bytes. However, for certain kinds of I/O-bound applications, the performance gains are dramatic and worth the added complexity.

Категории