Other Encodings

2017-11-03 09:05:01

Unicode support is growing, but there will doubtless be legacy data in other encodings that must be read for centuries to come. Such encodings include ASCII and Latin-1, as well as less common encoding schemes such as EBCDIC and MacRoman. There are multiple encodings in use for Arabic, Turkish, Hebrew, Greek, Cyrillic, Chinese, Japanese, Korean, and many other languages and scripts. The Reader and Writer classes allow you to read and write data in these different character sets. The String class also has a number of methods that convert between different encodings (though a String object itself is always represented in Unicode).

Modern desktop and server Java environments are pretty well guaranteed to have these six character sets available:

US-ASCII

ISO-8859-1

UTF-8

UTF-16BE

UTF-16LE

UTF-16

All other encodings are optional and may not be supported in any given VM. Most VMs will have many more encodings as well, but only these six are almost certain to be present. They're likely to be more interoperable, not just with Java but with other programs written in other languages. Some VMs, especially on Windows, omit some of the more obscure or larger encodings to save space. J2ME VMs will likely include many fewer to save space, and they don't have the java.nio.charsets package at all.

If you've installed Sun's JRE/JDK, a basic set of encodings is included in the standard rt.jar file along with all the other classes from the Java class library. There may also be a charsets.jar file that includes several dozen additional encodings, such as MacRoman and SJIS.

Категории