The Assembly Programming Master Book

Operating systems of the Windows NT family, starting from Windows 2000, have fully migrated to Unicode. However, most programmers haven't even noticed this event. This is because all operations with Unicode are internal for Windows. As relates to output parameters, such as strings for the MessageBox function, Windows continues to accept them in ANSI encoding. When such a function is called, the operating system converts the input ANSI string into a two-byte string and then works with the result. If the function must return the string value, the string must be converted from Unicode to ANSI. Additionally, for all functions that accept or return strings, there are "twins" with the same name , complemented by the trailing Wfunctions such as MessageBoxW or CharToOemW . These functions operate with Unicode strings. The resources, which will be covered later, are also stored in the Unicode format. Consequently, all functions that store and retrieve the resource information also convert it prior to accomplishing their tasks .

One of the most interesting functions is IsTextUnicode , which checks the information stored in the current buffer and determines whether or not it is presented in the Unicode format. The function is statistic, which means that it correctly determines whether the given text is in the Unicode format with a certain level of probability. Consider this function in more detail:

BOOL IsTextUnicode ( CONST VOID* pBuffer, int cb, LPINT lpi )

The function returns a nonzero value if it successfully recognizes Unicode format; otherwise , it returns zero.

Other values of constants can be found in the WINDOWS.INC file supplied with the MASM32 product. The most important point here is that all constants contain bits that don't overlap, which means that the memory area, to which the third argument of this function points, can contain combinations of these constants. An example illustrating the use of this function will be provided later.

Now, consider how conversions from ANSI to Unicode and from Unicode to ANSI are carried out. Two functions are intended for this purpose, MultiByteToWideChar and WideCharToMultiByte . Consider these functions in more detail.

The MultiByteToWideChar function is used for converting ANSI strings to Unicode strings:

int MultiByteTowideChar( UINT CodePage, DWORD dwFlags, LPCSTR lpMultiByteStr, int cbMultiByte, LPWSTR lpWideCharStr, int cchWideChar )

If the function completes successfully, it returns the size of the converted string.

If the sixth parameter is set to 0, the function won't carry out the conversion. Instead, it will return the buffer size required to store the converted string:

int WideCharToMultiByte( UINT CodePage, DWORD dwFlags, LPCWSTR lpWideCharStr, int cchWideChar, LPSTR lpMultiByteStr, int cbMultiByte, LPCSTR lpDefaultChar, LPBOOL lpUsedDefaultChar )

Listing 6.1 is the code fragment that converts the source string from ANSI to Unicode.

Listing 6.1: The fragment that carries out ANSI to UNICODE conversion

. ; The string to be converted STR1 DB "Console application", 0 ; The buffer for copying the converted string BUF DB 200 DUP(0) . . . PUSH 200 ; Maximum buffer length PUSH OFFSET BUF ; Buffer address PUSH -1 ; Define the string length automatically PUSH OFFSET STR1 ; String address PUSH 0 ; Flag PUSH 0 ; CP_ACP - ANSI encoding CALL MultiByteToWideChar@24 ; Now it is possible to work with the Unicode string BUF. ...

 

Regarding the provided fragment, it should be pointed out that it would be better to proceed as follows :

  1. Start the MultiByteToWideChar function with the sixth parameter set to 0.

    The function would return the size of the buffer for storing the converted string (in bytes).

  2. Allocate memory for that buffer.

  3. Convert the string by calling the MultiByteToWideChar function and specifying the sixth parameter.

  4. Work with the string.

  5. Release the memory block allocated to the buffer.

Chapter 12 contains an interesting macro that simplifies the conversion of the string from ASCII to Unicode.

Finally, I'd like to point out that there is a convenient technique of specifying the string that will be interpreted as Unicode directly in the program. For example, you could choose not to specify the string in a traditional manner, for example, as follows:

STRl DB "MASM 7.0", 0

Instead, it can be written as follows:

STRl DW 'M1,'A1,'S1,'M1,'71,'.','0', 0

After that, you can comfortably use the MessageBoxW function instead of MessageBoxA .

Категории