Filtered Readers and Writers

The java.io.FilterReader and java.io.FilterWriter classes are abstract classes that read characters and filter them in some way before passing the text along. You can imagine a FilterReader that converts all characters to uppercase.

public abstract class FilterReader extends Reader public abstract class FilterWriter extends Writer

Although FilterReader and FilterWriter are modeled after java.io.FilterInputStream and java.io.FilterOutputStream, they are much less commonly used than those classes. There are no concrete subclasses of FilterWriter in the java packages and only one concrete subclass of FilterReader (PushbackReader). These classes exist so you can write your own filters.

20.12.1. The FilterReader Class

FilterReader has a single constructor, which is protected:

protected FilterReader(Reader in)

The in argument is the Reader to which this filter is chained. This reference is stored in a protected field called in from which text for this filter is read and is null after the filter has been closed.

protected Reader in

Since FilterReader is an abstract class, only subclasses can be instantiated. Therefore, it doesn't matter that the constructor is protected since it may only be invoked from subclass constructors.

FilterReader provides the usual collection of read( ), skip( ), ready( ), markSupported( ), mark( ), reset( ), and close( ) methods. These all simply invoke the equivalent method in the in field with the same arguments. For example, the skip( ) method works like this:

public long skip(long n) throws IOException { return in.skip(n); }

Each subclass usually overrides at least these two read( ) methods to perform the filtering:

public int read( ) throws IOException public int read(char[] text, int offset, int length) throws IOException

In FilterReader, neither method invokes the other. You must override each of them, even if it's only to call the other one.

Java source code can include Unicode escapes for characters not available in the current character set. An escape sequence is a u followed by the four-hexadecimal-digit equivalent of the Unicode character. As an example, I'll write a FilterReader subclass that reads a u-escaped file and converts it to pure Unicode. This is a much trickier problem than it first appears. First, there's not a fixed ratio between the number of bytes and number of chars. Most of the time one byte is one char, but some of the time five bytes are one char. The second difficulty is ensuring that u09EF is recognized as Unicode escape while \u09EF is not. In other words, only a u preceded by an odd number of slashes is a valid Unicode escape. A u preceded by an even number of slashes should be passed along unchanged. Example 20-7 shows a solution.

Example 20-7. SourceReader

package com.elharo.io; import java.io.*; public class SourceReader extends FilterReader { public SourceReader(Reader in) { super(in); } private int buffer = -1; public int read( ) throws IOException { if (this.buffer != -1) { int c = this.buffer; this.buffer = -1; return c; } int c = in.read( ); if (c != '\') return c; int next = in.read( ); if (next != 'u' ) { // This is not a Unicode escape this.buffer = next; return c; } // Read next 4 hex digits // If the next four chars do not make a valid hex digit // this is not a valid .java file. StringBuffer sb = new StringBuffer( ); sb.append((char) in.read( )); sb.append((char) in.read( )); sb.append((char) in.read( )); sb.append((char) in.read( )); String hex = sb.toString( ); try { return Integer.valueOf(hex, 16).intValue( ); } catch (NumberFormatException ex) { throw new IOException("Bad Unicode escape: \u" + hex); } } private boolean endOfStream = false; public int read(char[] text, int offset, int length) throws IOException { if (endOfStream) return -1; int numRead = 0; for (int i = offset; i < offset+length; i++) { int temp = this.read( ); if (temp == -1) { this.endOfStream = true; break; } text[i] = (char) temp; numRead++; } return numRead; } public long skip(long n) throws IOException { char[] c = new char[(int) n]; int numSkipped = this.read(c); return numSkipped; } }

20.12.2. The FilterWriter Class

The FilterWriter class has a single constructor and no other unique methods:

protected FilterWriter(Writer out)

The out argument is the writer to which this filter is chained. This reference is stored in a protected field called out to which text sent through this filter is written.

protected Writer out

Since FilterWriter is an abstract class, only subclasses may be instantiated. Therefore, it doesn't matter that the constructor is protected since it may only be invoked from subclass constructors anyway. FilterWriter provides the usual collection of write( ), close( ), and flush( ) methods. These all simply invoke the equivalent method in the out field with the same arguments. For example, the close( ) method works like this:

public void close( ) throws IOException { out.close( ); }

Each subclass has to override at least these three write( ) methods to perform the filtering:

public void write(int c) throws IOException public void write(char[] text, int offset, int length) throws IOException public void write(String s, int offset, int length) throws IOException

In FilterWriter, these methods do not invoke each other. You must override each of them, even if it's only to call one of the other two.

There are no subclasses of FilterWriter in the core API. Example 20-8, SourceWriter, is an example of a FilterWriter that converts Unicode text to u-escaped ASCII. The big question is what to do if the input text contains an unescaped backslash. The simplest and most robust solution is to replace it with u005C, the Unicode escape for the backslash itself.

Example 20-8. SourceWriter

package com.elharo.io; import java.io.*; public class SourceWriter extends FilterWriter { public SourceWriter(Writer out) { super(out); } public void write(char[] text, int offset, int length) throws IOException { for (int i = offset; i < offset+length; i++) { this.write(text[i]); } } public void write(String s, int offset, int length) throws IOException { for (int i = offset; i < offset+length; i++) { this.write(s.charAt(i)); } } public void write(int c) throws IOException { // We have to escape the backslashes below. if (c == '\') out.write("\u005C"); else if (c < 128) out.write(c); else { String s = Integer.toHexString(c); // Pad with leading zeroes if necessary. if (c < 256) s = "00" + s; else if (c < 4096) s = "0" + s; out.write("\u"); out.write(s); } } }

20.12.3. PushbackReader

The PushbackReader class is a filter that provides a pushback buffer around a given reader. This allows a program to "unread" the last character it read. It's similar to PushbackInputStream discussed in Chapter 6, but instead of pushing back bytes, it pushes back chars. Both PushbackReader and BufferedReader use buffers, but only PushbackReader allows unreading and only BufferedReader allows marking and resetting. The first difference is that pushing back characters allows you to unread characters after the fact. Marking and resetting requires you to mark in advance the location you want to reset to. The second difference is that you can push back a character that was never on the stream in the first place. Marking and resetting only allows you to reread the same characters, not add new characters to the stream.

PushbackReader has two constructors, both of which take an underlying reader as an argument. The first uses a one-character pushback buffer; the second sets the pushback buffer to a specified size:

public PushbackReader(Reader in) public PushbackReader(Reader in, int size)

The PushbackReader class has the usual collection of read( ) methods. These methods first try to read the requested characters from the pushback buffer and only read from the underlying reader if the pushback buffer is empty or has too few characters.

PushbackReader also has ready( ), markSupported( ), and close( ) methods. The ready( ) and close( ) methods merely invoke the ready( ) and close( ) methods of the underlying reader. The markSupported( ) method returns false; pushback readers do not support marking and resetting.

Three unread( ) methods push back specific characters. The first pushes back the character c, the second pushes back the text array, and the third pushes back the subarray of text beginning at offset and continuing for length chars.

public void unread(int c) throws IOException public void unread(char[] text) throws IOException public void unread(char[] text, int offset, int length) throws IOException

The unread characters aren't necessarily the same as the characters that were read. The client programmer can insert text as the stream is read. The number of characters you can push back onto the stream is limited by the size of the buffer set in the constructor. Attempts to unread more characters than can fit in the buffer throw an IOException. An IOException is also thrown if you try to unread a closed reader; once a PushbackReader has been closed, it can be neither read nor unread.

Категории