Memory-Mapped I/O

Memory Mapped I O

About 12 years ago, I was a grad student working at the National Solar Observatory in Sunspot, New Mexico. The program I was working on involved three-dimensional Fourier transforms on four-dimensional data. We had taken snapshots of wind speeds in a particular three-dimensional chunk of the sun's photosphere, and my job was to try to make sense out of them. Our measurements covered an area of roughly 200 points by 100 points at 10 different depths over 100 time increments. This doesn't sound like a lot, but if you multiply it out, and figure we had a 4-byte float at each point, that's about 75 MB of data. To do the transforms we needed double that amount of memory. Every time I started running my naïve code to transform this monstrous set, everyone else's terminals slowed to a crawl as the disks began to thrash madly. Solaris was swapping everything in and out to try to find enough space to run my program. Within a couple of minutes, our normally friendly sysadmin would run down the hall yelling, "Rusty, what are you doing now?!"

In 2006 150 MB of working set size is no big deal, but back then it was a lot. Of course, the telescopes, spectrometers, charge-coupled detectors, and other tools have grown to match the capacity of today's computers, and grad students are now manipulating datasets that are gigabytes or more in size, still outpacing the growth of memory capacity.

Fortunately for the sysadmin's sanity and my continued employment, I soon discovered the magic of memory-mapped I/O. Instead of loading the arrays into memory, I just flipped one little switch in my program that told IDL (the programming language I was using at the time) to keep those particular arrays on disk and treat the disk as if it were a block of memory. This wasn't quite as fast as using real memory, but in this case the real memory wasn't there to be used anyway. It was all going out to disk sooner or later, and the only question was whether or not it went through Solaris's virtual memory system first. Memory-mapped I/O was like magic. My programs ran. In fact, they ran faster than they had before because the disks stopped thrashing, and the sysadmin stopped yelling at me because I was no longer overloading the server. Everyone was happy.

Memory-mapped I/O is not a solution to all problems. It really applies only when you have datasets that equal or exceed the available memory. However, in that event, it's a godsend. Programmers working in C, IDL, Fortran, and many other environments have been able to rely on memory-mapped I/O for a long time, and finally (as of 1.4) Java programmers can too.

14.13.1. Creating Mapped Byte Buffers

The MappedByteBuffer class maps a file directly into a ByteBuffer. You operate on it using the same put( ), get( ), putInt( ), getInt( ), and other methods for operating on any other ByteBuffer. However, the puts and gets access the file directly without copying a lot of data to and from RAM.

Mapped byte buffers are created using the map( ) method in the FileChannel class:

public abstract MappedByteBuffer map(FileChannel.MapMode mode, long position, long size) throws IOException

Memory mapping can operate in three modes:

 

FileChannel.MapMode.READ_ONLY

You can get data from the buffer but cannot change the data in the buffer.

 

FileChannel.MapMode.READ_WRITE

You can both get data from and put data in the buffer.

 

FileChannel.MapMode.PRIVATE

You can get data from and put data in the buffer. However, data you put is visible only through this buffer. The file itself is not changed.

For example, this code fragment maps the file test.png in read/write mode:

RandomAccessFile file = new RandomAccessFile("test.png", "rw"); FileChannel channel = file.getChannel( ); ByteBuffer buffer = channel.map( FileChannel.MapMode.READ_WRITE, 0, file.length( ));

The position of the buffer is initially equal to 0 and the limit is initially equal to the buffer's capacity. The capacity is whatever value was passed for the third argument. These are not necessarily the same as the zero position in and the length of the file itself. You can and often do memory map just a portion of a large file, if you don't need the while thing. For instance, this code fragment maps the portion of a PNG file that follows the initial 8-byte signature:

ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_WRITE, 8, file.length( )-8);

The initial position cannot be negative. That is, it cannot precede the beginning of the file. However, if the file is open for writing, the capacity can exceed the file's length. If so, the file will be expanded to the requested length.

The available modes depend on the underlying object from which the FileChannel was created. Random access files can be mapped in read-only or read/write modes. File input streams can be mapped in read-only mode. File output streams cannot be mapped at all. Normally, a RandomAccessFile is the source.

Java does not specify what happens if another process or even another part of the same program changes the file while it's mapped. The ByteBuffer object may or may not show the changes. If it does reflect those changes, the reflection may or may not be immediate. This will vary from one platform to the next.

14.13.2. MappedByteBuffer Methods

Besides the methods common to all byte buffers, MappedByteBuffer has three methods of its own: force( ), load( ), and isLoaded( ).

The load( ) method attempts to load the entire buffer into main memory:

public final MappedByteBuffer load( )

This may make access to the buffer faster, but then again it may not. If the data is larger than Java's heap size, this is likely to cause some page faults and disk thrashing. The isLoaded( ) method tells you whether a buffer is loaded:

public final boolean isLoaded( )

Finally, if you've put data in a MappedByteBuffer, you should flush the buffer when you're done with it, just like an OutputStream. However, instead of a flush( ) method, you use the force( ) method:

public final MappedByteBuffer force( )

As with flushing, this may not always be necessary. Data will eventually be written out from the buffer into the underlying file if the program doesn't crash. However, the force( ) method enables you to control when this happens and to make sure it does, at least for local filesystems. Java can't always immediately force network-mounted disks.

As a final example, let's consider how one might securely overwrite a file. The U.S. Department of Defense National Industrial Security Program Operating Manual 5220.22 (page 8-3-6) requires that the erasure of secret data be accomplished by overwriting each location with a 0 byte (0x00), its complement (0xFF), and then a random byte.

Top-secret data requires a more secure approach, with at least seven passes, including some overwriting with particular bit patterns. The truly paranoid use 35 passes in random orders. However this example suffices to demonstrate the points relevant to NIO.

Beyond performing multiple passes over the data, improved security also requires carefully erasing the file's name and other metadata, as well as any virtual memory or other locations where copies of the file's contents may reside.

Example 14-4 maps the entire file to be erased into memory. It then writes zeros into the file, then ones, then random data produced by a java.util.SecureRandom object. After each run, the buffer is forced to make sure the data is actually written to the disk. Otherwise, only the last pass might be committed. Failing to force the data might leave magnetic patterns an adversary could analyze, even if the actual file contents were the same.

Example 14-4. Erasing a file with a MappedByteBuffer

import java.io.*; import java.nio.*; import java.nio.channels.*; import java.security.SecureRandom; public class SecureDelete { public static void main(String[] args) throws IOException { File file = new File(args[0]); if (file.exists( )) { SecureRandom random = new SecureRandom( ); RandomAccessFile raf = new RandomAccessFile(file, "rw"); FileChannel channel = raf.getChannel( ); MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, raf.length( )); // overwrite with zeros while (buffer.hasRemaining( )) { buffer.put((byte) 0); } buffer.force( ); buffer.rewind( ); // overwrite with ones while (buffer.hasRemaining( )) { buffer.put((byte) 0xFF); } buffer.force( ); buffer.rewind( ); // overwrite with random data; one byte at a time byte[] data = new byte[1]; while (buffer.hasRemaining( )) { random.nextBytes(data); buffer.put(data[0]); } buffer.force( ); file.delete( ); } } }

This program is not especially fast. On fairly impressive hardware, it could erase a little over 100K a second. Some improvement could be made by overwriting more than a byte at a time, but if you do this be careful that the final write doesn't write too much and cause a BufferOverflowException.

Категории