The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Nine (Part 5)

Table of Content

Chapter Nine (Part 7)

CHAPTER NINE:
ARITHMETIC AND LOGICAL OPERATIONS (Part 6)
9.5 - Machine and Arithmetic Idioms
9.5.1 - Multiplying Without MUL and IMUL
9.5.2 - Division Without DIV and IDIV
9.5.3 - Using AND to Compute Remainders
9.5.4 - Implementing Modulo-n Counters with AND
9.5.5 - Testing an Extended Precision Value for 0FFFF..FFh
9.5.6 - TEST Operations
9.5.7 - Testing Signs with the XOR Instruction
9.5 Machine and Arithmetic Idioms

An idiom is an idiosyncrasy. Several arithmetic operations and 80x86 instructions have idiosyncracies that you can take advantage of when writing assembly language code. Some people refer to the use of machine and arithmetic idioms as "tricky programming" that you should always avoid in well written programs. While it is wise to avoid tricks just for the sake of tricks many machine and arithmetic idioms are well-known and commonly found in assembly language programs. Some of them can be really tricky but a good number of them are simply "tricks of the trade." This text cannot even begin to present all of the idioms in common use today; they are too numerous and the list is constantly changing. Nevertheless there are some very important idioms that you will see all the time so it makes sense to discuss those.

9.5.1 Multiplying Without MUL and IMUL

If you take a quick look at the timing for the multiply instruction you'll notice that the execution time for this instruction is rather long. Only the div and idiv instructions take longer on the 8086. When multiplying by a constant you can avoid the performance penalty of the mul and imul instructions by using shifts additions and subtractions to perform the multiplication.

Remember a shl operation performs the same operation as multiplying the specified operand by two. Shifting to the left two bit positions multiplies the operand by four. Shifting to the left three bit positions multiplies the operand by eight. In general shifting an operand to the left n bits multiplies it by 2n. Any value can be multiplied by some constant using a series of shifts and adds or shifts and subtractions. For example to multiply the ax register by ten you need only multiply it by eight and then add in two times the original value. That is 10*ax = 8*ax + 2*ax. The code to accomplish this is

                shl     ax
1           ;Multiply AX by two
mov     bx
ax          ;Save 2*AX for later
shl     ax
1           ;Multiply AX by four
shl     ax
1           ;Multiply AX by eight
add     ax
bx          ;Add in 2*AX to get 10*AX

The ax register (or just about any register for that matter) can be multiplied by most constant values much faster using shl than by using the mul instruction. This may seem hard to believe since it only takes two instructions to compute this product:

                mov     bx
10
mul     bx

However if you look at the timings the shift and add example above requires fewer clock cycles on most processors in the 80x86 family than the mul instruction. Of course the code is somewhat larger (by a few bytes) but the performance improvement is usually worth it. Of course on the later 80x86 processors the mul instruction is quite a bit faster than the earlier processors but the shift and add scheme is generally faster on these processors as well.

You can also use subtraction with shifts to perform a multiplication operation. Consider the following multiplication by seven:

                mov     bx
ax          ;Save AX*1
shl     ax
1           ;AX := AX*2
shl     ax
1           ;AX := AX*4
shl     ax
1           ;AX := AX*8
sub     ax
bx          ;AX := AX*7

This follows directly from the fact that ax*7 = (ax*8)-ax.

A common error made by beginning assembly language students is subtracting or adding one or two rather than ax*1 or ax*2. The following does not compute ax*7:

                shl     ax
1
shl     ax
1
shl     ax
1
sub     ax
1

It computes (8*ax)-1 something entirely different (unless of course ax = 1). Beware of this pitfall when using shifts additions and subtractions to perform multiplication operations.

You can also use the lea instruction to compute certain products on 80386 and later processors. The trick is to use the 80386 scaled index mode. The following examples demonstrate some simple cases:

                lea     eax
[ecx][ecx]         ;EAX := ECX * 2
lea     eax
[eax]eax*2]        ;EAX := EAX * 3
lea     eax
[eax*4]            ;EAX := EAX * 4
lea     eax
[ebx][ebx*4]       ;EAX := EBX * 5
lea     eax
[eax*8]            ;EAX := EAX * 8
lea     eax
[edx][edx*8]       ;EAX := EDX * 9

9.5.2 Division Without DIV and IDIV

Much as the shl instruction can be used for simulating a multiplication by some power of two the shr and sar instructions can be used to simulate a division by a power of two. Unfortunately you cannot use shifts additions and subtractions to perform a division by an arbitrary constant as easily as you can use these instructions to perform a multiplication operation.

Another way to perform division is to use the multiply instructions. You can divide by some value by multiplying by its reciprocal. The multiply instruction is marginally faster than the divide instruction; multiplying by a reciprocal is almost always faster than division.

Now you're probably wondering "how does one multiply by a reciprocal when the values we're dealing with are all integers?" The answer of course is that we must cheat to do this. If you want to multiply by one tenth there is no way you can load the value 1/10th into an 80x86 register prior to performing the division. However we could multiply 1/10th by 10 perform the multiplication and then divide the result by ten to get the final result. Of course this wouldn't buy you anything at all in fact it would make things worse since you're now doing a multiplication by ten as well as a division by ten. However suppose you multiply 1/10th by 65 536 (6553) perform the multiplication and then divide by 65 536. This would still perform the correct operation and as it turns out if you set up the problem correctly you can get the division operation for free. Consider the following code that divides ax by ten:

                mov     dx
6554        ;Round (65
536/10)
mul     dx

This code leaves ax/10 in the dx register.

To understand how this works consider what happens when you multiply ax by 65 536 (10000h). This simply moves ax into dx and sets ax to zero. Multiplying by 6 554 (65 536 divided by ten) puts ax divided by ten into the dx register. Since mul is marginally faster than div this technique runs a little faster than using a straight division.

Multiplying by the reciprocal works well when you need to divide by a constant. You could even use it to divide by a variable but the overhead to compute the reciprocal only pays off if you perform the division many many times (by the same value).

9.5.3 Using AND to Compute Remainders

The and instruction can be used to quickly compute remainders of the form:

		dest := dest MOD 2n

To compute a remainder using the and instruction simply and the operand with the value 2n-1. For example to compute ax = ax mod 8 simply use the instruction:

                and     ax
7

Additional examples:

                and     ax
3           ;AX := AX mod 4
and     ax
0Fh         ;AX := AX mod 16
and     ax
1Fh         ;AX := AX mod 32
and     ax
3Fh         ;AX := AX mod 64
and     ax
7Fh         ;AX := AX mod 128
mov     ah
0           ;AX := AX mod 256
; (Same as ax and 0FFh)

9.5.4 Implementing Modulo-n Counters with AND

If you want to implement a counter variable that counts up to 2n-1 and then resets to zero simply using the following code:

                inc     CounterVar
and     CounterVar
nBits

where nBits is a binary value containing n one bits right justified in the number. For example to create a counter that cycles between zero and fifteen you could use the following:

                inc     CounterVar
and     CounterVar
00001111b

9.5.5 Testing an Extended Precision Value for 0FFFF..FFh

The and instruction can be used to quickly check a multi-word value to see if it contains ones in all its bit positions. Simply load the first word into the ax register and then logically and the ax register with all the remaining words in the data structure. When the and operation is complete the ax register will contain 0FFFFh if and only if all the words in that structure contained 0FFFFh. E.g.

                mov     ax
word ptr var
and     ax
word ptr var+2
and     ax
word ptr var+4
.
.
.
and     ax
word ptr var+n
cmp     ax
0FFFFh
je      Is0FFFFh

9.5.6 TEST Operations

Remember the test instruction is an and instruction that doesn't retain the results of the and operation (other than the flag settings). Therefore many of the comments concerning the and operation (particularly with respect to the way it affects the flags) also hold for the test instruction. However since the test instruction doesn't affect the destination operand multiple bit tests may be performed on the same value. Consider the following code:

                test    ax
1
jnz     Bit0
test    ax
2
jnz     Bit1
test    ax
4
jnz     Bit3
etc.

This code can be used to successively test each bit in the ax register (or any other operand for that matter). Note that you cannot use the test/cmp instruction pair to test for a specific value within a string of bits (as you can with the and/cmp instructions). Since test doesn't strip out any unwanted bits the cmp instruction would actually be comparing the original value rather than the stripped value. For this reason you'll normally use the test instruction to see if a single bit is set or if one or more bits out of a group of bits are set. Of course if you have an 80386 or later processor you can also use the bt instruction to test individual bits in an operand.

Another important use of the test instruction is to efficiently compare a register against zero. The following test instruction sets the zero flag if and only if ax contains zero (anything anded with itself produces its original value; this sets the zero flag only if that value is zero):

                test ax
ax 

The test instruction is shorter than

                cmp     ax
0
or 
                cmp     eax
0 

though it is no better than

                cmp     al
0

Note that you can use the and and or instructions to test for zero in a fashion identical to test. However on pipelined processors like the 80486 and Pentium chips the test instruction is less likely to create a hazard since it does not store a result back into its destination register.

9.5.7 Testing Signs with the XOR Instruction

Remember the pain associated with a multi-precision signed multiplication operation? You need to determine the sign of the result take the absolute value of the operands multiply them and then adjust the sign of the result as determined before the multiplication operation. The sign of the product of two numbers is simply the exclusive-or of their signs before performing the multiplication. Therefore you can use the xor instruction to determine the sign of the product of two extended precision numbers. E.g.

32x32 Multiply:
mov     al
byte ptr Oprnd1+3
xor     al
byte ptr Oprnd2+3
mov     cl
al                  ;Save sign

; Do the multiplication here (don't forget to take the absolute
; value of the two operands before performing the multiply).

.
.
.

; Now fix the sign.

cmp     cl
0                   ;Check sign bit
jns     ResultIsPos

; Negate the product here.

.
.
.

ResultIsPos:

Chapter Nine (Part 5)

Table of Content

Chapter Nine (Part 7)

Chapter Nine: Arithmetic And Logical Operations (Part 6)
27 SEP 1996