The Art of Assembly Language

2.10 Sign Extension, Zero Extension, Contraction, and Saturation

Because two's complement format integers have a fixed length, a small problem develops. What happens if you need to convert an 8-bit two's complement value to 16 bits? This problem, and its converse (converting a 16-bit value to 8 bits) can be accomplished via sign extension and contraction operations.

Consider the value -64. The 8-bit two's complement value for this number is $C0. The 16-bit equivalent of this number is $FFC0. Now consider the value +64. The 8- and 16-bit versions of this value are $40 and $0040, respectively. The difference between the 8- and 16-bit numbers can be described by the rule: "If the number is negative, the H.O. byte of the 16-bit number contains $FF; if the number is positive, the H.O. byte of the 16-bit quantity is zero."

To extend a signed value from some number of bits to a greater number of bits is easy, just copy the sign bit into all the additional bits in the new format. For example, to sign extend an 8-bit number to a 16-bit number, simply copy bit 7 of the 8-bit number into bits 8..15 of the 16-bit number. To sign extend a 16-bit number to a double word, simply copy bit 15 into bits 16..31 of the double word.

You must use sign extension when manipulating signed values of varying lengths. Often you'll need to add a byte quantity to a word quantity. You must sign extend the byte quantity to a word before the operation takes place. Other operations (multiplication and division, in particular) may require a sign extension to 32 bits:

Sign Extension: 8 Bits 16 Bits 32 Bits $80 $FF80 $FFFF_FF80 $28 $0028 $0000_0028 $9A $FF9A $FFFF_FF9A $7F $007F $0000_007F --- $1020 $0000_1020 --- $8086 $FFFF_8086

To extend an unsigned value to a larger one you must zero extend the value. Zero extension is very easy: Just store a zero into the H.O. byte(s) of the larger operand. For example, to zero extend the 8-bit value $82 to 16 bits you simply add a zero to the H.O. byte yielding $0082.

Zero Extension: 8 bits 16 Bits 32 Bits $80 $0080 $0000_0080 $28 $0028 $0000_0028 $9A $009A $0000_009A $7F $007F $0000_007F --- $1020 $0000_1020 --- $8086 $0000_8086

The 80x86 provides several instructions that will let you sign or zero extend a smaller number to a larger number. Table 2-6 lists a group of instructions that will sign extend the AL, AX, or EAX register.

Table 2-6: Instructions for Extending AL, AX, and EAX

Instruction

Explanation


cbw();

// Converts the byte in AL to a word in AX via sign extension.

cwd();

// Converts the word in AX to a double word in DX:AX.

cdq();

// Converts the double word in EAX to the quad word in EDX:EAX.

cwde();

// Converts the word in AX to a doubleword in EAX.

Note that the cwd (convert word to double word) instruction does not sign extend the word in AX to the double word in EAX. Instead, it stores the H.O. word of the sign extension into the DX register (the notation "DX:AX" tells you that you have a double word value with DX containing the upper 16 bits and AX containing the lower 16 bits of the value). If you want the sign extension of AX to go into EAX, you should use the cwde (convert word to double word, extended) instruction.

The four instructions above are unusual in the sense that these are the first instructions you've seen that do not have any operands. These instructions' operands are implied by the instructions themselves.

Within a few chapters you will discover just how important these instructions are, and why the cwd and cdq instructions involve the DX and EDX registers. However, for simple sign extension operations, these instructions have a few major drawbacks: You do not get to specify the source and destination operands and the operands must be registers.

For general sign extension operations, the 80x86 provides an extension of the mov instruction, movsx (move with sign extension), that copies data and sign extends the data while copying it. The movsx instruction's syntax is very similar to the mov instruction:

movsx( source, dest );

The big difference in syntax between this instruction and the mov instruction is the fact that the destination operand must be larger than the source operand. That is, if the source operand is a byte, the destination operand must be a word or a double word. Likewise, if the source operand is a word, the destination operand must be a double word. Another difference is that the destination operand has to be a register; the source operand, however, can be a memory location.[4]

To zero extend a value, you can use the movzx instruction. It has the same syntax and restrictions as the movsx instruction. Zero extending certain 8-bit registers (AL, BL, CL, and DL) into their corresponding 16-bit registers is easily accomplished without using movzx by loading the complementary H.O. register (AH, BH, CH, or DH) with zero. Obviously, to zero extend AX into DX:AX or EAX into EDX:EAX, all you need to do is load DX or EDX with zero.[5]

The sample program in Listing 2-8 demonstrates the use of the sign extension instructions.

Listing 2-8: Sign Extension Instructions.

program signExtension; #include( "stdlib.hhf" ) static i8: int8; i16: int16; i32: int32; begin signExtension; stdout.put( "Enter a small negative number: " ); stdin.get( i8 ); stdout.put( nl, "Sign extension using CBW and CWDE:", nl, nl ); mov( i8, al ); stdout.put( "You entered ", i8, " ($", al, ")", nl ); cbw(); mov( ax, i16 ); stdout.put( "16-bit sign extension: ", i16, " ($", ax, ")", nl ); cwde(); mov( eax, i32 ); stdout.put( "32-bit sign extension: ", i32, " ($", eax, ")", nl ); stdout.put( nl, "Sign extension using MOVSX:", nl, nl ); movsx( i8, ax ); mov( ax, i16 ); stdout.put( "16-bit sign extension: ", i16, " ($", ax, ")", nl ); movsx( i8, eax ); mov( eax, i32 ); stdout.put( "32-bit sign extension: ", i32, " ($", eax, ")", nl ); end signExtension;

Sign contraction, converting a value with some number of bits to the identical value with a fewer number of bits, is a little more troublesome. Sign extension never fails. Given an m-bit signed value you can always convert it to an n-bit number (where n > m) using sign extension. Unfortunately, given an n-bit number, you cannot always convert it to an m-bit number if m < n. For example, consider the value -448. As a 16-bit signed number, its hexadecimal representation is $FE40. Unfortunately, the magnitude of this number is too large for an 8-bit value, so you cannot sign contract it to 8 bits. This is an example of an overflow condition that occurs upon conversion.

To properly sign contract one value to another, you must look at the H.O. byte(s) that you want to discard. The H.O. bytes you wish to remove must all contain either zero or $FF. If you encounter any other values, you cannot contract it without overflow. Finally, the H.O. bit of your resulting value must match every bit you've removed from the number. Examples (16 bits to 8 bits):

$FF80 can be sign contracted to $80. $0040 can be sign contracted to $40. $FE40 cannot be sign contracted to 8 bits. $0100 cannot be sign contracted to 8 bits.

Another way to reduce the size of an integer is by saturation. Saturation is useful in situations where you must convert a larger object to a smaller object, and you're willing to live with possible loss of precision. To convert a value via saturation you simply copy the larger value to the smaller value if it is not outside the range of the smaller object. If the larger value is outside the range of the smaller value, then you clip the value by setting it to the largest (or smallest) value within the range of the smaller object.

For example, when converting a 16-bit signed integer to an 8-bit signed integer, if the 16-bit value is in the range −128..+127 you simply copy the L.O. byte of the 16-bit object to the 8-bit object. If the 16-bit signed value is greater than +127, then you clip the value to +127 and store +127 into the 8-bit object. Likewise, if the value is less than −128, you clip the final 8-bit object to −128. Saturation works the same way when clipping 32-bit values to smaller values. If the larger value is outside the range of the smaller value, then you simply set the smaller value to the value closest to the out-of-range value that you can represent with the smaller value.

Obviously, if the larger value is outside the range of the smaller value, then there will be a loss of precision during the conversion. While clipping the value to the limits the smaller object imposes is never desirable, sometimes this is acceptable as the alternative is to raise an exception or otherwise reject the calculation. For many applications, such as audio or video processing, the clipped result is still recognizable, so this is a reasonable conversion to use.

[4]This doesn't turn out to be much of a limitation because sign extension almost always precedes an arithmetic operation that must take place in a register.

[5]Zero extending into DX:AX or EDX:EAX is just as necessary as the CWD and CDQ instructions, as you will eventually see.

Категории