Hardcoding a Unicode String

Problem

You have to hardcode a Unicode, i.e., wide-character, string in a source file.

Solution

Do this by hardcoding the string with a prefix of L and typing the character into your source editor as you would any other string, or use the hexadecimal number that represents the Unicode character you're after. Example 13-1 shows how to do it both ways.

Example 13-1. Hardcoding a Unicode string

#include #include #include using namespace std; int main( ) { // Create some strings with Unicode characters wstring ws1 = L"Infinity: u221E"; wstring ws2 = L"Euro: _"; wchar_t w[] = L"Infinity: u221E"; wofstream out("tmp\unicode.txt"); out << ws2 << endl; wcout << ws2 << endl; }

 

Discussion

Hardcoding a Unicode string is mostly a matter of deciding how you want to enter the string in your source editor. C++ provides a wide-character type, wchar_t, which can store Unicode strings. The exact implementation of wchar_t is implementation defined, but it is often UTF-32. The class wstring, defined in , is a sequence of wchar_ts, just like the string class is a sequence of chars. (Strictly speaking, of course, wstring is a typedef for basic_string).

The easiest way to enter Unicode characters is to use the L prefix to a string literal, as in Example 13-1:

wstring ws1 = L"Infinity: u2210"; // Use the code itself wstring ws2 = L"Euro: _"; // Or just type it in

Now, you can write these wide-character strings to a wide-character stream, like this:

wcout << ws1 << endl; // wcout is the wide char version of cout

This goes for files, too:

wofstream out("tmp\unicode.txt"); out << ws2 << endl;

The trickiest part of dealing with different character encodings isn't embedding the right characters in your source files, it's knowing what kind of character data you are getting back from a database, HTTP request, user input, and so on, and this is beyond the realm of the C++ standard. The C++ standard does not require a particular encoding, rather that the character encoding used by your operating system to store source files can be anything, as long as it supports at least the 96 characters used by the C++ language. For characters that are not part of this character set, called the basic source character set, the standard indicates that they must be available by using the uXXXX or UXXXXXXXX escape sequences, where each X is a hexadecimal digit.

Категории