Representing Data in a Computer
When programming in a high-level language like Java or C++, you use variables of different types (such as integer, float, or character). Once you have declared variables, you don't have to worry about how the data are represented in the computer. When you deal with a computer at the machine level, however, you must be more concerned with how data are stored. Often you have the job of converting data from one representation to another. This chapter covers some common ways that data are represented in a microcomputer. Chapter 2 gives an overview of microcomputer hardware and software. Chapter 3 illustrates how to write an assembly language program that directly controls execution of the computer's native instructions.
Binary and Hexadecimal Numbers
A computer uses bits (binary digits, each an electronic state representing zero or one) to denote values. We represent such binary numbers using the digits 0 and 1 and a base 2 place-value system. This binary number system is like the decimal system except that the positions (right to left) are 1's, 2's, 4's, 8's, 16's (and higher powers of 2) instead of 1's, 10's, 100's, 1000's, 10000's (powers of 10). For example, the binary number 1101 can be interpreted as the decimal number 13,
1 |
1 |
0 |
1 |
|||||
one 8 |
+ |
one 4 |
+ |
no 2 |
+ |
one 1 |
= |
13 |
Binary numbers are so long that they are awkward to read and write. For instance, it takes the eight bits 11111010 to represent the decimal number 250, or the fifteen bits 111010100110000 to represent the decimal number 30000. The hexadecimal (base 16) number system represents numbers using about one-fourth as many digits as the binary system. Conversions between hexadecimal and binary are so easy that hex can be thought of as shorthand for binary. The hexadecimal system requires sixteen digits. The digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 are used just as in the decimal system; A, B, C, D, E, and F are used for the decimal numbers 10, 11, 12, 13, 14, and 15, respectively. Either uppercase or lowercase letters can be used for the new digits.
The positions in hexadecimal numbers correspond to powers of 16. From right to left, they are 1's, 16's, 256's, etc. The value of the hex number 9D7A is 40314 in decimal since
9 × 4096 36864 [ 4096 = 163 ] + 13 × 256 3328 [ D is 13, 256 = 162 ] + 7 × 16 112 + 10 × 1 10 [ A is 10 ] = 40314
Figure 1.1 shows small numbers expressed in decimal, hexadecimal, and binary systems. It is worthwhile to memorize this table or to be able to construct it very quickly.
Decimal |
Hexadecimal |
Binary |
---|---|---|
0 |
0 |
0 |
1 |
1 |
1 |
2 |
2 |
10 |
3 |
3 |
11 |
4 |
4 |
100 |
5 |
5 |
101 |
6 |
6 |
110 |
7 |
7 |
111 |
8 |
8 |
1000 |
9 |
9 |
1001 |
10 |
A |
1010 |
11 |
B |
1011 |
12 |
C |
1100 |
13 |
D |
1101 |
14 |
E |
1110 |
15 |
F |
1111 |
Figure 1.1: Decimal, hexadecimal, and binary numbers
You have seen above how to convert binary or hexadecimal numbers to decimal. How can you convert numbers from decimal to hex? From decimal to binary? From binary to hex? From hex to binary? We'll show how to do these conversions manually, but often the easiest way is to use a calculator that allows numbers to be entered in decimal, hexadecimal, or binary. Conversion between bases is normally a matter of pressing a key or two. These calculators can do arithmetic directly in binary or hex as well as decimal and often have a full range of other functions available. One warning: Many of these calculators use seven segment displays and display the lowercase letter b so that it looks almost like the numeral 6. Other characters may also be difficult to read.
A calculator isn't needed to convert a hexadecimal number to its equivalent binary form. In fact, many binary numbers are too long to be displayed on a typical calculator. Instead, simply substitute four bits for each hex digit. The bits are those found in the third column of Fig. 1.1, padded with leading zeros as needed. For example,
3B8E216 = 11 1011 1000 1110 00102
The subscripts 16 and 2 are used to indicate the base of the system in which a number is written; they are usually omitted when there is little chance of confusion. The extra spaces in the binary number are just to make it more readable. Note that the rightmost hex digit 2 was converted to 0010, including leading zeros. While it's not necessary to convert the leading 3 to 0011, the conversion would have been correct since leading zeros do not change the value of a binary number.
To convert binary numbers to hexadecimal format, reverse the above steps: Break the binary number into groups of four bits, starting from the right, and substitute the corresponding hex digit for each group of four bits. For example,
1011011101001101111 = 101 1011 1010 0110 1111 = 5BA6F
You have seen how to convert a binary number to an equivalent decimal number. However, instead of converting a long binary number directly to decimal, it is faster to convert it to hex, and then convert the hex number to decimal. Again, using the above 19-bit-long number,
10110111010011011112 = 101 1011 1010 0110 1111 = 5BA6F16 = 5 × 65536 + 11 × 4096 + 10 × 256 + 6 × 16 + 15 × 1 = 37540710
The following is an algorithm for converting a decimal number to its hex equivalent. It produces the hex digits of the answer right to left. The algorithm is expressed in pseudocode, which is the way that algorithms and program designs will be written in this book.
until DecimalNumber = 0 loop divide DecimalNumber by 16, getting Quotient and Remainder; Remainder (in hex) is the next digit (right to left); DecimalNumber := Quotient; end until;
Example
As an example, the decimal-to-hex algorithm is traced for the decimal number 5876:
- Since this is an until loop, the controlling condition is not checked until after the body has been executed the first time.
The octal (base 8) number system is used with some computer systems. Octal numbers are written using digits 0 through 7. Most calculators that do hex arithmetic also handle octal values. It is easy to convert a binary number to octal by writing the octal equivalent for each group of three bits, or to convert from octal to binary by replacing each octal digit by three bits. To convert from decimal to octal, one can use an algorithm that is the same as the decimal to hex scheme except that you divide by 8 instead of 16 at each step.
Exercises 1.1
Complete the table below by supplying the missing two forms for each number.
Binary |
Hexadecimal |
Decimal |
|
---|---|---|---|
1. |
100 |
________ |
________ |
2. |
10101101 |
________ |
________ |
3. |
1101110101 |
________ |
________ |
4. |
11111011110 |
________ |
________ |
5. |
10000000001 |
________ |
________ |
6. |
________ |
8EF |
________ |
7. |
________ |
10 |
________ |
8. |
________ |
A52E |
________ |
9. |
________ |
70C |
________ |
10. |
________ |
6BD3 |
________ |
11. |
________ |
________ |
100 |
12. |
________ |
________ |
527 |
13. |
________ |
________ |
4128 |
14. |
________ |
________ |
11947 |
15. |
________ |
________ |
59020 |
Character Codes
Letters, numerals, punctuation marks, and other characters are represented in a computer by assigning a numeric value to each character. Several schemes for assigning these numeric values have been used. The system commonly used with microcomputers is the American Standard Code for Information Interchange (abbreviated ASCII and pronounced ASK-ee).
The ASCII system uses seven bits to represent characters, so that values from 000 0000 to 111 1111 are assigned to characters. This means that 128 different characters can be represented using ASCII codes. The ASCII codes are usually given as hex numbers from 00 to 7F or as decimal numbers from 0 to 127.[1] Appendix A has a complete listing of ASCII codes. Using this table, you can check that the message
Computers are fun.
can be coded in ASCII, using hex numbers, as
43 |
6F |
6D |
70 |
75 |
74 |
65 |
72 |
73 |
20 |
61 |
72 |
65 |
20 |
66 |
75 |
6E |
2E |
C |
o |
m |
p |
u |
t |
e |
r |
s |
a |
r |
e |
f |
u |
n |
. |
Note that a space, even though it is invisible, has a character code (hex 20).
Numbers can be represented using character codes. For example, the ASCII codes for the date October 21, 1976 are
4F |
63 |
74 |
6F |
62 |
65 |
72 |
20 |
32 |
31 |
2C |
20 |
31 |
39 |
37 |
36 |
O |
c |
t |
o |
b |
e |
r |
2 |
1 |
, |
1 |
9 |
7 |
6 |
with the number 21 represented using ASCII codes 32 31, and 1976 represented using 31 39 37 36. This is very different from the binary representation in the last section, where 2110 = 101012 and 197010 = 111101110002. Computers use both of these representations for numbers: ASCII for input and output, and binary for internal computations.
The ASCII code assignments may seem rather arbitrary, but there are certain patterns. The codes for uppercase letters are contiguous, as are the codes for lowercase letters. The codes for an uppercase letter and the corresponding lowercase letter differ by exactly one bit. Bit 5 is 0 for an uppercase letter and 1 for the corresponding lowercase letter while other bits are the same. (Bits in most computer architectures are numbered right to left, starting with 0 for the rightmost bit.) For example,
- uppercase M codes as 4D16 = 10011012
- lowercase m codes as 6D16 = 11011012
The printable characters are grouped together from 2016 to 7E16. (A space is considered a printable character.) Numerals 0, 1, …, 9 have ASCII codes 3016, 3116, …, 3916, respectively.
The characters from 0016 to 1F16, along with 7F16, are known as control characters. For example, the ESC key on an ASCII keyboard generates a hex 1B code. The abbreviation ESC stands for extra services control but most people say "escape." The ESC character is often sent in combination with other characters to a peripheral device like a printer to turn on a special feature. Since such character sequences are not standardized, they will not be covered in this book.
The two ASCII control characters that will be used the most frequently in this book are 0D16 and 0A16, for carriage return (CR) and line feed (LF). The 0D16 code is generated by an ASCII keyboard when the Return or Enter key is pressed. When it is sent to an ASCII display, it causes the cursor to move to the beginning of the current line without going down to a new line. When carriage return is sent to an ASCII printer (at least one of older design), it causes the print head to move to the beginning of the line. The line feed code 0A16 causes an ASCII display to move the cursor straight down, or a printer to roll the paper up one line, in both cases without going to the beginning of the new line. To display a message and move to the beginning of a new line, it is necessary to send the message characters plus CR and LF characters to the screen or printer. This may be annoying sometimes as you program in assembly language, but you will also have the option to not use CR and/or LF when you want to leave the cursor on a line after prompting for input, or to piece together a line using several output instructions.
Lesser-used control characters include form feed (0C16), which causes many printers to eject a page; horizontal tab (0916), which is generated by the tab key on the keyboard; backspace (0816) generated by the Backspace key; and delete (7F16) generated by the Delete key. Notice that the Backspace and Delete keys do not generate the same codes. The bell character (0716) causes an audible signal when output to the display. Good programming practice is to sound the bell only when really necessary.
Many large computers represent characters using Extended Binary Coded Decimal Information Code (abbreviated EBCDIC and pronounced ib-SEE-dick or eb-SEE-dick). The EBCDIC system will only be used in this book as an example of another coding scheme when translation from one coding system to another is discussed.
Exercises 1.2
- Each of the following hexadecimal numbers can be interpreted as representing a decimal number or a pair of ASCII codes. Give both interpretations.
(a)
2A45
(b)
7352
(c)
2036
(d)
106E
- Find the ASCII codes for the characters in each of the following strings. Don't forget spaces and punctuation. Carriage return and line feed are shown by CR and LF, respectively (written together as CRLF so that it will be clear that there is no space character between them).
- January 1 is New Year's Day.CRLF
- George said, "Ouch!"
- R2D2 was C3P0's friend.CRLF ["0" is the numeral zero]
- Your name? [put two spaces after the question mark]
- Enter value: [put two spaces after the colon]
- What would be displayed if you output each of the following sequences of ASCII codes to a computer's screen?
- 62 6C 6F 6F 64 2C 20 73 77 65 61 74 20 61 6E 64 20 74 65 61 72 73
- 6E 61 6D 65 0D 0A 61 64 64 72 65 73 73 0D 0A 63 69 74 79 0D 0A
- 4A 75 6E 65 20 31 31 2C 20 31 39 34 37 0D 0A
- 24 33 38 39 2E 34 35
- 49 44 23 3A 20 20 31 32 33 2D 34 35 2D 36 37 38 39
[1]Some computers, including the IBM PC and compatible systems, use an extended character set, additionally assigning characters to hex numbers 80 to FF (decimal 128 to 255). Extended character sets will not be used in this book.
s Complement Representation for Signed Integers
It is now time to look more carefully at how numbers are actually represented in a computer. We have looked at two schemes to represent numbers-by using binary integers (often expressed in hex) or by using ASCII codes. However, these methods have two problems: (1) the number of bits available for representing a number is limited, and (2) it is not clear how to represent a negative number.
Chapter 2 will discuss computer hardware, but for now you need to know that memory is divided into bytes, each byte containing eight bits.[2] Suppose you want to use ASCII codes to represent a number in memory. A single ASCII code is normally stored in a byte. Recall that ASCII codes are seven bits long; the extra (left-hand, or high order) bit is set to 0. To solve the first representation problem mentioned above, you can simply include the code for a minus sign. For example, the ASCII codes for the four characters -817 are 2D, 38, 31, and 37. To solve the first problem, you could always agree to use a fixed number of bytes, perhaps padding on the left with ASCII codes for zeros or spaces. Alternatively, you could use a variable number of bytes, but agree that the number ends with the last ASCII code for a digit, that is, terminating the string with a nondigit.
Suppose you want to use internal representations for numbers corresponding to their binary values. Then you must choose a fixed number of bits for the representation. Most central processing units can do arithmetic on binary numbers having a few chosen lengths. For the Intel 80×86 family, these lengths are 8 bits (a byte), 16 bits (a word),[3] 32 bits (a doubleword), and 64 bits (a quadword).
As an example, look at the word-length binary representation of 697.
69710 = 10101110012 = 00000010101110012
Leading zeros have been added to make 16 bits. Writing this in hex in a word, you have
02 |
B9 |
This illustrated convention will be followed throughout this book. Strips of boxes will represent sequences of bytes. The contents of a single byte will be represented in hex, with two hex digits in each byte since a single hex digit corresponds to four bits. The double-word representation of 697 simply has more leading zeros.
00 |
00 |
02 |
B9 |
What we now have is a good system of representing nonnegative, or unsigned, numbers. This system cannot represent negative numbers. Also, for any given length, there is a largest unsigned number that can represented, for example FF16 or 25510 for byte length.
The 2's complement system is similar to the above scheme for unsigned numbers, but it allows representation of negative numbers. Numbers will be a fixed length, so that you might find the "word-length 2's complement representation" of a number. The 2's complement representation for a nonnegative number is almost identical to the unsigned representation; that is, you represent the number in binary with enough leading zeros to fill up the desired length. Only one additional restriction exists-for a positive number, the left-most bit must be zero. This means, for example, that the most positive number that can be represented in word-size 2's complement form is 01111111111111112 or 7FFF16 or 3276710.
As you have probably already guessed, the leftmost bit is always one in the 2's-complement representation of a negative number. You might also guess that the rest of the representation is just the same as for the corresponding positive number, but unfortunately the situation is more complicated than that. That is, you cannot simply change the leading bit from 0 to 1 to get the negative version of a number.
A hex calculator makes it easy to convert a negative decimal number to 2's complement form. For instance, if the decimal display shows 565 and the convert-to-hex key is pressed, a typical calculator will display FFFFFFFDCB (perhaps with a different number of leading F's). For a word-size representation, ignore all but the last four hex digits; the answer is
FD |
CB |
or 1111 1101 1100 1011 in binary. (Note the leading 1 bit for a negative number.) the doubleword representation is
FF |
FF |
FD |
CB |
which is almost too long to write in binary.
The 2's complement representation of a negative number can also be found without a calculator. One method is to first express the unsigned number in hex, and then subtract this hex number from 1000016 to get the word length representation. The number you subtract from is, in hex, a 1 followed by the number of 0's in the length of the representation; for example, 10000000016 to get the doubleword length representation. (What would you use for a byte-length 2's complement representation? For a quadword-length 2's complement representation?) In binary, the number of zeros is the length in binary digits. This binary number is a power of two, and subtraction is sometimes called "taking the complement," so this operation is the source of the term "2's complement."
Example
The word-length 2's complement representation of the decimal number 76 is found by first converting the unsigned number 76 to its hex equivalent 4C, then by subtracting 4C from 10000.
1 0 0 0 0 - 4 C
Since you cannot subtract C from 0, you have to borrow 1 from 1000, leaving FFF.
F F F 10 - 4 C -------- F F B 4
After borrowing, the subtraction is easy. The units digit is
1016 − C16 = 1610 − 1210 = 4 (in decimal or hex),
and the 16's position is
F16 − 4 = 1510 − 410 = 1110 = B16
It is not necessary to convert the hex digits to decimal to subtract them if you learn the addition and subtraction tables for single hex digits.
The operation of subtracting a number from 1 followed by an appropriate number of 0's is called taking the 2's complement, or complementing the number. Thus "2's complement" is used both as the name of a representation system and as the name of an operation. The operation of taking the 2's complement corresponds to pressing the change sign key on a hex calculator.
Since a given 2's complement representation is a fixed length, obviously there is a maximum size number that can be stored in it. For a word, the largest positive number stored is 7FFF, since this is the largest 16 bit long number that has a high order bit of 0 when written in binary. The hex number 7FFF is 32767 in decimal. Positive numbers written in hex can be identified by a leading hex digit of 0 through 7. Negative numbers are distinguished by a leading bit of 1, corresponding to hex digits of 8 through F.
How do you convert a 2's complement representation to the corresponding decimal number? First, determine the sign of a 2's complement number. To convert a positive 2's complement number to decimal, just treat it like any unsigned binary number and convert it by hand or with a hex calculator. For example, the word-length 2's complement number 0D43 represents the decimal number 3395.
Dealing with a negative 2's complement number-one starting with a 1 bit or 8 through F in hex-is a little more complicated. Note that any time you take the 2's complement of a number and then take the 2's complement of the result, you get back to the original number. For a word size number N, ordinary algebra gives you
N = 10000 − (10000 N)
For example, using the word length 2's complement value F39E
10000 − (10000 − F39E) = 10000 − C62 = F39E
This says again that the 2's complement operation corresponds to negation. Because of this, if you start with a bit pattern representing a negative number, the 2's complement operation can be used to find the positive (unsigned) number corresponding it.
Example
The word-length 2's complement number E973 represents a negative value since the sign bit (leading bit) is 1 (E = 1110). Taking the complement finds the corresponding positive number.
10000 − E973 = 168D = 577310
This means that the decimal number represented by E973 is −5773.
The word-length 2's complement representations with a leading 1 bit range from 8000 to FFFF. These convert to decimal as follows:
10000 − 8000 = 8000 = 3276810,
so 8000 is the representation of 32768. Similarly,
10000 − FFFF = 1,
so FFFF is the representation of −1. Recall that the largest positive decimal integer that can be represented as a word-length 2's complement number is 32767; the range of decimal numbers that can be represented in word-length 2's complement form is −32768 to 32767.
Using a calculator to convert a negative 2's complement representation to a decimal number is a little tricky. For example, if you start with the word length representation FF30 and your calculator displays 10 hex digits, you must enter the 10 hex digit long version of the number FFFFFFFF30, with six extra leading F's. Then push the convert to decimal button(s) and your calculator should display 208.
Exercises 1.3
- Find the word-length 2's complement representation of each of the following decimal numbers:
- 845
- 15000
- 100
- -10
- -923
- Find the doubleword-length 2's complement representation of each of the following decimal numbers:
- 3874
- 1000000
- -100
- -55555
- Find the byte-length 2's complement representation of each of the following decimal numbers:
- 23
- 111
- -100
- -55
- Find the decimal integer that is represented by each of these wordlength 2's complement numbers:
- 00 A3
- FF FE
- 6F 20
- B6 4A
- Find the decimal integer that is represented by each of these doubleword-length 2's complement numbers:
- 00 00 F3 E1
- FF FF FE 03
- 98 C2 41 7D
- Find the decimal integer that is represented by each of these byte-length 2's complement numbers:
- E1
- 7C
- FF
- Find the range of decimal integers that can be stored in 2's complement form in a byte.
- Find the range of decimal integers that can be stored in 2's complement form in a doubleword.
- This section showed how to take the 2's complement of a number by subtracting it from an appropriate power of 2. An alternative method is to write the number in binary (using the correct number of bits for the length of the representation), change each 0 bit to 1 and each 1 bit to zero (this is called "taking the 1's complement"), and then adding 1 to the result (discarding any carry into an extra bit). Show that these two methods are equivalent.
[2]Some early computer systems used byte sizes different than eight bits.
[3]Other computer architectures use a word size different than 16 bits.
Addition and Subtraction of 2 s Complement Numbers
One of the reasons that the 2's complement representation scheme is commonly used to store signed integers in computers is that addition and subtraction operations can be easily and efficiently implemented in computer hardware. This section discusses addition and subtraction of 2's complement numbers and introduces the concepts of carry and overflow that will be needed later.
To add two 2's complement numbers, simply add them as if they were unsigned binary numbers. The 80×86 architecture uses the same addition instructions for unsigned and signed numbers. The following examples use word-size representations.
First, 0A07 and 01D3 are added. These numbers are positive whether they are interpreted as unsigned numbers or as 2's complement numbers. The decimal version of the addition problem is given on the right.
0A07 2567 + 01D3 + 467 0BDA 3034
The answer is correct in this case since BDA16 = 303410.
Next, 0206 and FFB0 are added. These are, of course, positive as unsigned numbers, but interpreted as 2's complement signed numbers, 0206 is a positive number and FFB0 is negative. This means that there are two decimal versions of the addition problem. The signed one is given first, then the unsigned version.
0206 518 518 + FFB0 + (-80) + 65456 101B6 438 65974
There certainly appears to be a problem since it will not even fit in a word. In fact, since 101B6 is the hex version of 65974, there is no way to represent the correct sum of unsigned numbers in a word. However, if the numbers are interpreted as signed and you ignore the extra 1 on the left, then the word 01B6 is the 2's complement representation of the decimal number 438.
Now FFE7 and FFF6 are added, both negative numbers in a signed interpretation. Again, both signed and unsigned decimal interpretations are shown.
FFE7 (-25) 65511 + FFF6 + (-10) + 65526 1FFDD -35 131037
Again, the sum in hex is too large to fit in two bytes, but if you throw away the extra 1, then FFDD is the correct word-length 2's complement representation of 35.
Each of the last two additions have a carry out of the usual high order position into an extra digit. The remaining digits give the correct 2's complement representation. The remaining digits are not always the correct 2's complement sum, however. Consider the addition of the following two positive numbers:
483F 18495 + 645A + 25690 AC99 44185
There was no carry out of the high order digit, but the signed interpretation is plainly incorrect since AC99 represents the negative number 21351. Intuitively, what went wrong is that the decimal sum 44185 is bigger than the maximal value 32767 that can be stored in the two bytes of a word. However, when these numbers are interpreted as unsigned, the sum is correct.
The following is another example showing a "wrong" answer, this time resulting from adding two numbers that are negative in their signed interpretation.
E9FF (-5633) 59903 + 8CF0 + (-29456) + 36080 176EF - 35089 95983
This time there is a carry, but the remaining four digits 76 EF cannot be the right-signed answer since they represent the positive number 30447. Again, intuition tells you that something had to go wrong since -32768 is the most negative number that can be stored in a word.
In the above "incorrect" examples, overflow occurred. As a human being, you detect overflow by the incorrect signed answer. Computer hardware can detect overflow as it performs addition, and the signed sum will be correct if there is no overflow. The computer actually performs addition in binary, of course, and the process is logically a right-to-left pairwise addition of bits, very similar to the procedure that humans use for decimal addition. As the computer adds a pair of bits, sometimes a carry (of 1) into the next column to the left is generated. This carry is added to the sum of these two bits, etc. The column of particular interest is the leftmost one: the sign position. There may be a carry into this position and/or a carry out of this position into the "extra" bit. This "carry out" (into the extra bit) is what was called just "carry" above and was seen as the extra hex 1. Figure 1.2 identifies when overflow does or does not occur. The table can be summarized by saying that overflow occurs when the number of carries into the sign position is different from the number of carries out of the sign position.
Carry into sign bit? |
Carry out sign bit? |
Overflow? |
---|---|---|
no |
no |
no |
no |
yes |
yes |
yes |
no |
yes |
yes |
yes |
no |
Figure 1.2: Overflow in addition
Each of the above addition examples is now shown again, this time in binary. Carries are written above the two numbers.
111 0000 1010 0000 0111 0A07 + 0000 0001 1101 0011 + 01D3 0000 1011 1101 1010 0BDA
This example has no carry into the sign position and no carry out, so there is no overflow.
1 1111 11 0000 0010 0000 0110 0206 + 1111 1111 1011 0000 + FFB0 1 0000 0001 1011 0110 101B6
This example has a carry into the sign position and a carry out, so there is no overflow.
1 1111 1111 11 11 1111 1111 1110 0111 FFE7 + 1111 1111 1111 0110 + FFF6 1 1111 1111 1101 1101 1FFDD
Again, there is both a carry into the sign position and a carry out, so there is no overflow.
1 1111 11 0100 1000 0011 1111 483F + 0110 0100 0101 1010 + 645A 1010 1100 1001 1001 AC99
Overflow does occur in this addition since there is a carry into the sign position, but no carry out.
1 1 11 111 1110 1001 1111 1111 E9FF + 1000 1100 1111 0000 + 8CF0 1 0111 0110 1110 1111 176EF
There is also overflow in this addition since there is a carry out of the sign bit, but no carry in.
In a computer, subtraction a − b of numbers a and b is usually performed by taking the 2's complement of b and adding the result to a. This corresponds to adding the negation of b. For example, for the decimal subtraction 195 - 618 = -423,
00C3 - 026A
is changed to addition of FD96, the 2's complement of 026A.
00C3 + FD96 FE59
The hex digits FE59 do represent −423. Looking at the above addition in binary, you have
11 11 0000 0000 1100 0011 + 1111 1101 1001 0110 1111 1110 0101 1001
Notice that there was no carry in the addition. However, this subtraction did involve a borrow. A borrow occurs in the subtraction a − b when b is larger than a as unsigned numbers. Computer hardware can detect a borrow in subtraction by looking at whether on not a carry occurred in the corresponding addition. If there is no carry in the addition, then there is a borrow in the subtraction. If there is a carry in the addition, then there is no borrow in the subtraction. (Remember that "carry" by itself means "carry out.")
Here is one more example. Doing the decimal subtraction 985 411 = 574 using word-length 2's complement representations,
03D9 - 019B
is changed to addition of FE65, the 2's complement of 019B.
1 1111 1111 1 1 03D9 0000 0011 1101 1001 + FE65 + 1111 1110 0110 0101 1023E 1 0000 0010 0011 1110
Discarding the extra 1, the hex digits 023E do represent 574. This addition has a carry, so there is no borrow in the corresponding subtraction.
Overflow is also defined for subtraction. When you are thinking like a person, you can detect it by the wrong answer that you will expect when you know that the difference is going to be outside of the range that can be represented in the chosen length for the representation. A computer detects overflow in subtraction by determining whether or not overflow occurs in the corresponding addition problem. If overflow occurs in the addition problem, then it occurs in the original subtraction problem; if it does not occur in the addition, then it does not occur in the original subtraction. There was no overflow in either of the above subtraction examples. Overflow does occur if you use word-length 2's complement representations to attempt the subtraction −29123 −15447. As a human, you know that the correct answer −44570 is outside the range −32,768 to +32,767. In the computer hardware
8E3D - 3C57
is changed to addition of C3A9, the 2's complement of 3C57.
1 1 11 111 1 8E3D 1000 1110 0011 1101 + C3A9 + 1100 0011 1010 1001 151E6 1 0101 0001 1110 0110
There is a carry out of the sign position, but no carry in, so overflow occurs.
Although examples in this section have use word-size 2's complement representations, the same techniques apply when performing addition or subtraction with byte-size, doubleword-size, or other size 2's complement numbers.
Exercises 1.4
Perform each of the following operations on word-size 2's complement numbers. For each, find the specified sum or difference. Determine whether overflow occurs. For a sum, determine whether there is a carry. For a difference, determine whether there is a borrow. Check your answers by converting the problem to decimal.
1. |
003F + 02A4 |
2. |
1B48 + 39E1 |
3. |
6C34 + 5028 |
4. |
7FFE + 0002 |
5. |
FF07 + 06BD |
6. |
2A44 + D9CC |
7. |
FFE3 + FC70 |
8. |
FE00 + FD2D |
9. |
FFF1 + 8005 |
10. |
8AD0 + EC78 |
11. |
9E58 − EBBC |
12. |
EBBC − 9E58 |
13. |
EBBC − 791C |
14. |
791C −EBBC |
Other Systems for Representing Numbers
Sections 1.2 and 1.3 presented two commonly-used systems for representing numbers in computers, strings of character codes (often ASCII), and 2's complement form. This section introduces three additional schemes, 1's complement, binary coded decimal (BCD), and floating point. The 1's complement system is an alternative scheme for representing signed integers; it is used in a few computer systems, but not the Intel 80×86 family. Binary coded decimal and floating point forms are used in 80×86 computers, as well as many other systems. They will be discussed more fully when appropriate instructions for manipulating data in these forms are covered. The primary reason for introducing them here is to illustrate that there are many alternative representations for numeric data, each valid when used in the correct context.
The 1's complement system is similar to 2's complement. A fixed length is chosen for the representation and a positive integer is simply the binary form of the number, padded with one or more leading zeros on the left to get the desired length. To take the negative of the number, each bit is "complemented"; that is, each zero is changed to one and each one is changed to zero. This operation is sometimes referred to as taking the 1's complement of a number. Although it is easier to negate an integer using 1's complement than 2's complement, the 1's complement system has several disadvantages, the most significant being that it is harder to design circuitry to add or subtract numbers in this form. There are two representations for zero (why?), an awkward situation. Also, a slightly smaller range of values can be represented; for example, −127 to 127 for an 8 bit length, instead of −128 to 127 in a 2's complement system.
The byte length 1's complement representation of the decimal number 97 is just the value 0110 0001 in binary (61 in hex). Changing each 0 to 1 and each 1 to 0 gives 1001 1110 (9E in hex), the byte length 1's complement representation of −97.
There is a useful connection between taking the 1's complement and taking the 2's complement of a binary number. If you take the 1's complement of a number and then add 1, you get the 2's complement. This is sometimes easier to do by hand than the subtraction method presented in Section 1.3. You were asked to verify the equivalence of these methods in Exercise 1.3.9.
In binary coded decimal (BCD) schemes, each decimal digit is coded with a string of bits with fixed length, and these strings are pieced together to form the representation. Most frequently four bits are used for each decimal digit; the choices for bit patterns are shown in Fig. 1.3. Only these ten bit patterns are used.
Decimal |
BCD bit pattern |
---|---|
0 |
0000 |
1 |
0001 |
2 |
0010 |
3 |
0011 |
4 |
0100 |
5 |
0101 |
6 |
0110 |
7 |
0111 |
8 |
1000 |
9 |
1001 |
Figure 1.3: Binary coded decimal representation
One BCD representation of the decimal number 926708 is 1001 0010 0110 0111 0000 1000. Using one hex digit as shorthand for four bits, and grouping two hex digits per byte, this BCD representation can be expressed in three bytes as
92 |
67 |
08 |
Notice that the BCD representation in hex looks just like the decimal number.
Often BCD numbers are encoded using some fixed number of bytes. For purposes of illustration, assume a four-byte representation. For now, the question of how to represent a sign will be ignored; without leaving room for a sign, eight binary-coded decimal digits can be stored in four bytes. Given these choices, the decimal number 3691 has the BCD representation
00 |
00 |
36 |
91 |
Notice that the doubleword 2's complement representation for the same number would be 00 00 0E 6B, and that the ASCII codes for the four numerals are 33 36 39 31.
It is not as efficient for a computer to do arithmetic with numbers in a BCD format as with 2's complement numbers. It is usually very inefficient to do arithmetic on numbers represented using ASCII codes. However, ASCII codes are the only method so far for representing a number that is not an integer. For example, 78.375 can be stored as 37 38 2E 33 37 35. Floating point representation systems allow for nonintegers to be represented, or at least closely approximated.
Floating point schemes store numbers in a form that corresponds closely to scientific notation. The following example shows how to convert the decimal number 78.375 into IEEE single format that is 32 bits long. (IEEE is the abbreviation for the Institute of Electrical and Electronics Engineers.) This format was one of several sponsored by the Standards Committee of the IEEE Computer Society and approved by the IEEE Standards Board and the American National Standards Institute (ANSI). It is one of the floating point formats used in Intel 80×86 processors.
First, 78.375 must be converted to binary. In binary, the positions to the right of the binary point (it is not appropriate to say decimal point for the "." in a binary number) correspond to negative powers of two (1/2, 1/4, 1/8, etc.), just as they correspond to negative powers of 10 (1/10, 1/100, etc.) in a decimal number. Since 0.375 = 3/8 = 1/4 + 1/8 = .012 + .0012, 0.37510 = 0.0112. The whole part 78 is 1001110 in binary, so
78.37510 = 1001110.0112.
Next this is expressed in binary scientific notation with the mantissa written with 1 before the radix point.
1001110.0112 = 1.001110011 × 26
The exponent is found exactly as it is in decimal scientific notation, by counting the number of positions the point must be moved to the right or left to produce the mantissa. The notation here is really mixed; it would be more proper to write 26 as 10110, but it is more convenient to use the decimal form. Now the floating point number can be pieced together:
- left bit 0 for a positive number (1 means negative)
- 1000 0101 for the exponent. This is the actual exponent of 6, plus a bias of 127, with the sum, 133, in 8 bits.
- 00111001100000000000000, the fraction expressed with the leading 1 removed and padded with zeros on the right to make 23 bits
The entire number is then 0 10000101 00111001100000000000000. Regrouping gives 0100 0010 1001 1100 1100 0000 0000 0000, or, in hex
42 |
9C |
C0 |
00 |
This example worked out easily because 0.375, the noninteger part of the decimal number 78.375, is a sum of negative powers of 2. Most numbers are not as nice, and usually a binary fraction is chosen to closely approximate the decimal fraction. Techniques for choosing such an approximation are not covered in this book.
To summarize, the following steps are used to convert a decimal number to IEEE single format:
- The leading bit of the floating point format is 0 for a positive number and 1 for a negative number.
- Write the unsigned number in binary.
- Write the binary number in binary scientific notation f23.f22 … f0 2e, where f23 = 1. There are 24 fraction bits, but it is not necessary to write trailing 0's.
- Add a bias of 12710 to the exponent e. This sum, in binary form, is the next 8 bits of the answer, following the sign bit. (Adding a bias is an alternative to storing the exponent as a signed number.)
- The fraction bits f22f21 … f0 form the last 23 bits of the floating point number. The leading bit f23 (which is always 1) is dropped.
Computer arithmetic on floating point numbers is usually much slower than with 2's complement integers. However the advantages of being able to represent nonintegral values or very large or small values often outweigh the relative inefficiency of computing with them.
Exercises 1.5
Express each of the following decimal numbers as a word-length 1's complement number.
1. |
175 |
2. |
-175 |
3. |
-43 |
4. |
43 |
Use BCD to encode each of the following decimal numbers in four bytes. Express each answer in hex digits, grouped two per byte.
5. |
230 |
6. |
1 |
7. |
12348765 |
8. |
17195 |
Use IEEE single format to encode each of the following decimal numbers in floating point.
9. |
175.5 |
10. |
-1.25 |
11. |
-11.75 |
12. |
45.5 |
Summary
All data are represented in a computer using electronic signals. These can be interpreted as patterns of binary digits (bits). These bit patterns can be thought of as binary numbers. Numbers can be written in decimal, hexadecimal, or binary forms.
For representing characters, most microcomputers use ASCII codes. One code is assigned for each character, including nonprintable control characters.
Integer values are represented in a predetermined number of bits in 2's complement form; a positive number is stored as a binary number (with at least one leading zero to make the required length), and the pattern for a negative number can be obtained by subtracting the positive form from a 1 followed by as many 0's as are used in the length. A 2's complement negative number always has a leading 1 bit. A hex calculator, used with care, can simplify working with 2's complement numbers.
Addition and subtraction are easy with 2's complement numbers. Since the length of a 2's complement number is limited, there is the possibility of a carry, a borrow, or overflow.
Other formats in which numbers are stored are 1's complement, binary coded decimal (BCD), and floating point.