Web Resources
E 6 Character Ranges
The Unicode Standard assigns code values, which range from 0000 (Basic Latin) to E007F (Tags), to the written characters of the world. Currently, there are code values for 94,140 characters. To simplify the search for a character and its associated code value, the Unicode Standard generally groups code values by script and function (i.e., Latin characters are grouped in a block, mathematical operators are grouped in another block, etc.). As a rule, a script is a single writing system that is used for multiple languages (e.g., the Latin script is used for English, French, Spanish, etc.). The Code Charts page on the Unicode Consortium Web site lists all the defined blocks and their respective code values. Figure E.4 lists some blocks (scripts) from the Web site and their range of code values.
Script |
Range of Code Values |
---|---|
Arabic |
U+0600U+06FF |
Basic Latin |
U+0000U+007F |
Bengali (India) |
U+0980U+09FF |
Cherokee (Native America) |
U+13A0U+13FF |
CJK Unified Ideographs (East Asia) |
U+4E00U+9FAF |
Cyrillic (Russia and Eastern Europe) |
U+0400U+04FF |
Ethiopic |
U+1200U+137F |
Greek |
U+0370U+03FF |
Hangul Jamo (Korea) |
U+1100U+11FF |
Hebrew |
U+0590U+05FF |
Hiragana (Japan) |
U+3040U+309F |
Khmer (Cambodia) |
U+1780U+17FF |
Lao (Laos) |
U+0E80U+0EFF |
Mongolian |
U+1800U+18AF |
Myanmar |
U+1000U+109F |
Ogham (Ireland) |
U+1680U+169F |
Runic (Germany and Scandinavia) |
U+16A0U+16FF |
Sinhala (Sri Lanka) |
U+0D80U+0DFF |
Telugu (India) |
U+0C00U+0C7F |
Thai |
U+0E00U+0E7F |