The InputStreamReader Class
The most important concrete subclass of Reader is InputStreamReader:
public class InputStreamReader extends Reader
The constructor connects a character reader to an underlying input stream:
public InputStreamReader(InputStream in) public InputStreamReader(InputStream in, String encoding) throws UnsupportedEncodingException
The first constructor uses the platform's default encoding, as given by the system property file.encoding. The second one uses the specified encoding. For example, to attach an InputStreamReader to System.in with the default encoding:
InputStreamReader isr = new InputStreamReader(System.in);
If you want to read a file encoded in Latin-5 (ASCII plus Turkish, as specified by ISO 8859-9), you might do this:
FileInputStream fin = new FileInputStream("turkish.txt"); InputStreamReader isr = new InputStreamReader(fin, "8859_9");
In Java 1.4 and later, you can specify the encoding as a Charset or CharsetDecoder object instead:
public InputStreamReader(InputStream in, Charset encoding) // Java 1.4 public InputStreamReader(InputStream in, CharsetDecoder decoder) // Java 1.4
The read( ) methods read bytes from an underlying input stream and convert those bytes to characters according to the specified encoding:
public int read( ) throws IOException public int read(char[] text, int offset, int length) throws IOException public int read(CharBuffer target) // Java 5 throws IOException, NullPointerException, ReadOnlyBufferException
The getEncoding( ) method returns a string containing the name of the encoding used by this reader:
public String getEncoding( )
The remaining two methods just override methods from java.io.Reader but behave identically from the perspective of the programmer:
public boolean ready( ) throws IOException public void close( ) throws IOException
The close( ) method does close the underlying input stream.
InputStreamReader does not itself support marking and resetting, though it can be chained to a reader that does.
Example 20-2 uses an InputStreamReader to read a file in a user-specified encoding. The FileConverter reads the name of the input file, the name of the output file, the input encoding, and the output encoding. Characters that are not available in the output character set are replaced by the substitution character.
Example 20-2. CharacterSetConverter
import java.io.*; public class StreamRecoder { public static void main(String[] args) { if (args.length < 2) { System.err.println( "Usage: java StreamRecoder " + "infile_encoding outfile_encoding infile outfile"); return; } InputStreamReader isr = null; OutputStreamWriter osw = null; try { File infile = new File(args[2]); File outfile = new File(args[3]); if (outfile.exists( ) && infile.getCanonicalPath().equals(outfile.getCanonicalPath( ))) { System.err.println("Can't convert file in place"); return; } FileInputStream fin = new FileInputStream(infile); FileOutputStream fout = new FileOutputStream(outfile); isr = new InputStreamReader(fin, args[0]); osw = new OutputStreamWriter(fout, args[1]); while (true) { int c = isr.read( ); if (c == -1) break; // end of stream osw.write(c); } osw.close( ); isr.close( ); } catch (IOException ex) { System.err.println(ex); ex.printStackTrace( ); } finally { if (isr != null) { try { isr.close( ); } catch (IOException ex) { ex.printStackTrace( ); } } if (osw != null) { try { osw.close( ); } catch (IOException ex) { ex.printStackTrace( ); } } } } } |
Since this is just a simple example, I haven't put a lot of effort into the user interface. A more realistic command-line interface would provide a set of flags and sensible defaults. Even better would be a graphical user interface. I'll demonstrate that at the end of the chapter, when we return to the file viewer program.
Example 20-2 is very similar to the Recoder class in Example 19-3 in the previous chapter. However, that class accessed the CharsetEncoder and CharsetDecoder more directly. This is a higher level approach that hides a lot of the implementation detail, which makes it much simpler and easier to understand. Most of the time in streaming situations, it's going to be a lot easier to use InputStreamReader and/or OutputStreamWriter than Charset or CharsetEncoder/CharsetDecoder. Charset, CharsetEncoder, and CharsetDecoder fit better when you have one large block of text or bytes to encode or decode rather than an ongoing stream. Charset, CharsetEncoder, and CharsetDecoder also offer a few more configuration options, especially for handling encoding errors in the input data. However, usually the way InputStreamReader and OutputStreamWriter handle this (replacing each malformed byte with the default substitution character) is fine.