Web Resources

E 6 Character Ranges

The Unicode Standard assigns code values, which range from 0000 (Basic Latin) to E007F (Tags), to the written characters of the world. Currently, there are code values for 94,140 characters. To simplify the search for a character and its associated code value, the Unicode Standard generally groups code values by script and function (i.e., Latin characters are grouped in a block, mathematical operators are grouped in another block, etc.). As a rule, a script is a single writing system that is used for multiple languages (e.g., the Latin script is used for English, French, Spanish, etc.). The Code Charts page on the Unicode Consortium Web site lists all the defined blocks and their respective code values. Figure E.4 lists some blocks (scripts) from the Web site and their range of code values.

Figure E.4. Some character ranges.

Script

Range of Code Values

Arabic

U+0600U+06FF

Basic Latin

U+0000U+007F

Bengali (India)

U+0980U+09FF

Cherokee (Native America)

U+13A0U+13FF

CJK Unified Ideographs (East Asia)

U+4E00U+9FAF

Cyrillic (Russia and Eastern Europe)

U+0400U+04FF

Ethiopic

U+1200U+137F

Greek

U+0370U+03FF

Hangul Jamo (Korea)

U+1100U+11FF

Hebrew

U+0590U+05FF

Hiragana (Japan)

U+3040U+309F

Khmer (Cambodia)

U+1780U+17FF

Lao (Laos)

U+0E80U+0EFF

Mongolian

U+1800U+18AF

Myanmar

U+1000U+109F

Ogham (Ireland)

U+1680U+169F

Runic (Germany and Scandinavia)

U+16A0U+16FF

Sinhala (Sri Lanka)

U+0D80U+0DFF

Telugu (India)

U+0C00U+0C7F

Thai

U+0E00U+0E7F

Категории