The OutputStreamWriter Class
java.io.Writer is an abstract class. Its most basic concrete subclass is OutputStreamWriter:
public class OutputStreamWriter extends Writer
Its constructor connects a character writer to an underlying output stream:
public OutputStreamWriter(OutputStream out) public OutputStreamWriter(OutputStream out, String encoding) throws UnsupportedEncodingException
The first constructor configures the writer to encode text in the platform's default encoding. The second constructor specifies an encoding. For example, this code attaches an OutputStreamWriter to System.out with the default encoding:
OutputStreamWriter osw = new OutputStreamWriter(System.out);
On U.S. and Western European systems, the default encoding is usually Cp1252 on Windows, ISO 8859-1 (Latin-1) on Unix and Linux, and MacRoman on Macs. More recent Linuxes may use UTF-8 everywhere. Whatever the default is, you can read it from the system property file.encoding:
String defaultEncoding = System.getProperty("file.encoding");
On the other hand, if you want to write a file encoded in ISO 8859-7 (ASCII plus Greek) you'd have to do this:
FileOutputStream fos = new FileOutputStream("greek.txt"); OutputStreamWriter greekWriter = new OutputStreamWriter(fos, "8859_7");
You should almost never use the default encoding. It's likely to cause problems as files are moved between platforms and countries, especially if the document format contains no means of indicating the encoding. If the file format does not specify a different encoding, choose UTF-8:
FileOutputStream fos = new FileOutputStream("data.txt"); OutputStreamWriter utfWriter = new OutputStreamWriter(fos, "UTF-8");
There are reasons to pick other encodings, especially when dealing with legacy software and formats that mandate something else. However, unless specified otherwise, you should choose UTF-8. It has the best mix of interoperability, robustness, compactness, and script support.
The write( ) methods convert characters to bytes according to the specified character encoding and write those bytes onto the underlying output stream:
public void write(int c) throws IOException public void write(char[] text, int offset, int length) throws IOException public void write(String s, int offset, int length) throws IOException
Once the Writer is constructed, writing the characters is easy:
String arete = "u03B1u03C1u03B5u03C4u03B7"; greekWriter.write(arete, 0, arete.length( ));
The String variable arete contains the Unicode-escaped encoding of ar
import java.io.*; public class UnicodeBMPTable { public static void main(String[] args) throws IOException { // Use platform default with a fallback to Latin-1 if necessary String encoding = System.getProperty("file.encoding", "ISO-8859-1"); String lineSeparator = System.getProperty("line.separator", " "); OutputStream target = System.out; if (args.length > 0) target = new FileOutputStream(args[0]); if (args.length > 1) encoding = args[1]; OutputStreamWriter out = null; try { out = new OutputStreamWriter(target, encoding); } catch (UnsupportedEncodingException ex) { // use platform default encoding out = new OutputStreamWriter(target); } try { for (int i = Character.MIN_VALUE; i <= Character.MAX_VALUE; i++) { // Skip undefined code points; these are not characters if (!Character.isDefined(i)) continue; char c = (char) i; // Surrogates are not full characters so skip them; // this requires Java 5 if (Character.isHighSurrogate(c) || Character.isLowSurrogate(c)) continue; out.write(i + ": " + c + lineSeparator); } } finally { out.close( ); } } }
Here's a sample of the file this program writes when the MacRoman encoding is specified:
213: Õ 214: Ö 215: ? 216: Ø 217: Ù 218: Ú 219: û 220: Ü 221: ? 222: ? 223: ß 224: à
MacRoman is a one byte encoding so it can only hold about 256 different characters. The remaining characters are all replaced by the substitution character, a question mark. Unicode characters 215, 221, and 222 just don't exist in this character set.