Digest Streams

The MessageDigest class isn't particularly hard to use, as I hope Example 12-1 and Example 12-2 demonstrated. It's flexible and can calculate a digest for anything that can be converted into a byte array, such as a string, an array of floating-point numbers, or the contents of a text area. Nonetheless, the input data almost always comes from streams. Therefore, the java.security package contains an input stream and an output stream class that use MessageDigest to calculate a digest for the stream as it is read or written. These are DigestInputStream and DigestOutputStream.

12.3.1. DigestInputStream

The DigestInputStream class is a subclass of FilterInputStream:

public class DigestInputStream extends FilterInputStream

DigestInputStream has all the usual methods of any input stream, like read( ), skip( ), and close( ). It overrides two read( ) methods to do its filtering. Clients use these methods exactly as they use the read( ) methods of other input streams.

DigestInputStream does not change the data it reads in any way. However, as each byte or group of bytes is read, it is fed as input to a MessageDigest object stored in the class as the protecteddigest field:

protected MessageDigest digest;

The digest field is normally set in the constructor:

public DigestInputStream(InputStream stream, MessageDigest digest)

For example:

URL u = new URL("http://java.sun.com"); DigestInputStream din = new DigestInputStream(u.openStream( ), MessageDigest.getInstance("SHA-256"));

The digest is not cloned inside the class. Only a reference to it is stored. Therefore, the message digest used inside the stream should only be used by the stream. Simultaneous or interleaved use by other objects will corrupt the digest.

The setMessageDigest( ) method changes the MessageDigest object used by the stream:

public void setMessageDigest(MessageDigest digest)

You can retrieve the message digest at any time by calling getMessageDigest( ):

public MessageDigest getMessageDigest( )

After you invoke getMessageDigest( ), the digest field of the stream has received all the data read by the stream up to that point. However, it has not been finished. It is still necessary to invoke digest( ) to complete the calculation. For example:

MessageDigest md = dis.getMessageDigest( ); md.digest( );

On rare occasions, you may only want to digest part of a stream. You can turn digesting off at any point by passing false to the on( ) method:

public void on(boolean on)

You can turn digesting back on by passing true to on( ). When digest streams are created, they are on by default.

Finally, there's a toString( ) method, which is a little unusual in input streams. It simply returns "[Digest Input Stream]" plus the string representation of the digest.

public String toString( )

The body of Example 12-1 could be rewritten to make use of a DigestInputStream like this:

URL u = new URL(args[0]); InputStream in = u.openStream( ); MessageDigest sha = MessageDigest.getInstance("SHA"); DigestInputStream din = new DigestInputStream(in, sha); byte[] data = new byte[128]; while (true) { int bytesRead = din.read(data); if (bytesRead < 0) break; } MessageDigest md = din.getMessageDigest( ); byte[] result = md.digest( ); for (int i = 0; i < result.length; i++) { System.out.println(result[i]); }

The main purpose of DigestInputStream is to be one of a chain of filters. Otherwise, it doesn't really make your work any easier. You still need to construct the MessageDigest object by invoking getInstance( ), pass it to the DigestInputStream( ) constructor, retrieve the MessageDigest object from the input stream, invoke its digest( ) method, and retrieve the digest data from that object. I would prefer the DigestInputStream to completely hide the MessageDigest object. You could pass the name of the digest algorithm to the constructor as a string rather than as an actual MessageDigest object. The digest would be made available only after the stream was closed, and then only through its data, not through the actual object.

12.3.2. DigestOutputStream

The DigestOutputStream class is a subclass of FilterOutputStream that maintains a digest of all the bytes it has written:

public class DigestOutputStream extends FilterOutputStream

DigestOutputStream has all the usual methods of any output stream, like write( ), flush( ), and close( ). It overrides two write( ) methods to do its filtering, but they are used as they would be for any other output stream. DigestOutputStream does not change the data it writes in any way. However, as each byte or group of bytes is written, it is fed as input to a MessageDigest object stored in the class as the protected digest field:

protected MessageDigest digest;

This field is normally set in the constructor:

public DigestOutputStream(OutputStream out, MessageDigest digest)

For example:

FileOutputStream fout = new FileOutputStream("data.txt"); DigestOutputStream dout = new DigestOutputStream(fout, MessageDigest.getInstance("SHA"));

The constructor does not copy the MessageDigest object; it just stores a reference to it. Therefore, the message digest stored inside the stream should only be used by the stream. Interleaved use by other objects or simultaneous use by other threads will corrupt the digest. You can change the MessageDigest object used by the stream with the setMessageDigest( ) method:

public void setMessageDigest(MessageDigest digest)

You can retrieve the message digest at any time by calling getMessageDigest( ):

public MessageDigest getMessageDigest( )

After you invoke getMessageDigest( ), the digest field contains the digest of all the data written by the stream up to that point. However, it has not been finished. It is still necessary to invoke digest( ) to complete the calculation. For example:

MessageDigest md = dout.getMessageDigest( ); md.digest( );

On rare occasions, you may want to digest only part of a stream. For instance, you might want to calculate the digest of the body of an email message while ignoring the headers. You can turn digesting off at any point by passing false to the on( ) method:

public void on(boolean on)

You can turn digesting back on by passing TRue to on( ). When digest output streams are created, they are on by default.

Finally, there's a toString( ) method, which is a little unusual in output streams. It simply returns "[Digest Output Stream]" plus the string representation of the digest.

public String toString( )

Example 12-3 is a FileDigest class that reads data from a specified URL and copies it into a file on the local system. As the file is written, its SHA digest is calculated. When the file is closed, the digest is printed.

Example 12-3. FileDigest

import java.net.*; import java.io.*; import java.security.*; public class FileDigest { public static void main(String[] args) throws IOException, NoSuchAlgorithmException { if (args.length != 2) { System.err.println("Usage: java FileDigest url filename"); return; } URL u = new URL(args[0]); FileOutputStream out = new FileOutputStream(args[1]); copyFileWithDigest(u.openStream( ), out); out.close( ); } public static void copyFileWithDigest(InputStream in, OutputStream out) throws IOException, NoSuchAlgorithmException { MessageDigest sha = MessageDigest.getInstance("SHA-512"); DigestOutputStream dout = new DigestOutputStream(out, sha); byte[] data = new byte[128]; while (true) { int bytesRead = in.read(data); if (bytesRead < 0) break; dout.write(data, 0, bytesRead); } dout.flush( ); byte[] result = dout.getMessageDigest().digest( ); for (int i = 0; i < result.length; i++) { System.out.print(result[i] + " "); } System.out.println( ); } }

A sample run looks like this:

% java FileDigest http://www.oreilly.com/ oreilly.html 10 -10 103 -27 -110 3 -2 -115 8 -112 13 19 25 76 -120 31 51 116 -94 -58

DigestOutputStream is useful when you need a digest in the middle of a chain of filter streams. For instance, you could write data onto a data output stream chained to a gzip output stream chained to a file output stream. When you had finished writing the data onto the data output stream, you could calculate the digest and write that directly onto the file output stream. When the data was read back in, you could use a digest input stream chained to a data input stream to check that the file had not been corrupted in the meantime. If the digest calculated by the digest input stream matched the digest stored in the file, you'd know the data was OK.

Категории