The Art of ASSEMBLY LANGUAGE PROGRAMMING Chapter Ten (Part 1) Table of Content Chapter Ten (Part 3)
 CHAPTER TEN: CONTROL STRUCTURES (Part 2) 10.3 - CASE Statements 10.4 - State Machines and Indirect Jumps 10.5 - Spaghetti Code 10.3 CASE Statements

The Pascal case statement takes the following form :

CASE variable OF
const1:stmt1;
const2:stmt2;
.
.
.
constn:stmtn
END;

When this statement executes it checks the value of variable against the constants const1 ... constn. If a match is found then the corresponding statement executes. Standard Pascal places a few restrictions on the case statement. First if the value of variable isn't in the list of constants the result of the case statement is undefined. Second all the constants appearing as case labels must be unique. The reason for these restrictions will become clear in a moment.

Most introductory programming texts introduce the case statement by explaining it as a sequence of if..then..else statements. They might claim that the following two pieces of Pascal code are equivalent:

CASE I OF
0: WriteLn('I=0');
1: WriteLn('I=1');
2: WriteLn('I=2');
END;

IF I = 0 THEN WriteLn('I=0')
ELSE IF I = 1 THEN WriteLn('I=1')
ELSE IF I = 2 THEN WriteLn('I=2');

While semantically these two code segments may be the same their implementation is usually different[1]. Whereas the if..then..else if chain does a comparison for each conditional statement in the sequence the case statement normally uses an indirect jump to transfer control to any one of several statements with a single computation. Consider the two examples presented above they could be written in assembly language with the following code:

mov     bx
I
shl     bx
1           ;Multiply BX by two
jmp     cs:JmpTbl[bx]

JmpTbl          word    stmt0
stmt1
stmt2

Stmt0:          print
byte    "I=0"
cr
lf
0
jmp     EndCase

Stmt1:          print
byte    "I=1"
cr
lf
0
jmp     EndCase

Stmt2:          print
byte    "I=2"
cr
lf
0

EndCase:

; IF..THEN..ELSE form:

mov     ax
I
cmp     ax
0
jne     Not0
print
byte    "I=0"
cr
lf
0
jmp     EndOfIF

Not0:           cmp     ax
1
jne     Not1
print
byte    "I=1"
cr
lf
0
jmp     EndOfIF

Not1:           cmp     ax
2
jne     EndOfIF
Print
byte    "I=2"
cr
lf
0
EndOfIF:

Two things should become readily apparent: the more (consecutive) cases you have the more efficient the jump table implementation becomes (both in terms of space and speed). Except for trivial cases the case statement is almost always faster and usually by a large margin. As long as the case labels are consecutive values the case statement version is usually smaller as well.

What happens if you need to include non-consecutive case labels or you cannot be sure that the case variable doesn't go out of range? Many Pascals have extended the definition of the case statement to include an otherwise clause. Such a case statement takes the following form:

CASE variable OF
const:stmt;
const:stmt;
. .
. .
. .
const:stmt;
OTHERWISE stmt
END;

If the value of variable matches one of the constants making up the case labels then the associated statement executes. If the variable's value doesn't match any of the case labels then the statement following the otherwise clause executes. The otherwise clause is implemented in two phases. First you must choose the minimum and maximum values that appear in a case statement. In the following case statement the smallest case label is five the largest is 15:

CASE I OF
5:stmt1;
8:stmt2;
10:stmt3;
12:stmt4;
15:stmt5;
OTHERWISE stmt6
END;

Before executing the jump through the jump table the 80x86 implementation of this case statement should check the case variable to make sure it's in the range 5..15. If not control should be immediately transferred to stmt6:

mov     bx
I
cmp     bx
5
jl      Otherwise
cmp     bx
15
jg      Otherwise
shl     bx
1
jmp     cs:JmpTbl-10[bx]

The only problem with this form of the case statement as it now stands is that it doesn't properly handle the situation where I is equal to 6 7 9 11 13 or 14. Rather than sticking extra code in front of the conditional jump you can stick extra entries in the jump table as follows:

mov     bx
I
cmp     bx
5
jl      Otherwise
cmp     bx
15
jg      Otherwise
shl     bx
1
jmp     cs:JmpTbl-10[bx]

Otherwise:      {put stmt6 here}
jmp     CaseDone

JmpTbl          word    stmt1
Otherwise
Otherwise
stmt2
Otherwise
word    stmt3
Otherwise
stmt4
Otherwise
Otherwise
word    stmt5
etc.

Note that the value 10 is subtracted from the address of the jump table. The first entry in the table is always at offset zero while the smallest value used to index into the table is five (which is multiplied by two to produce 10). The entries for 6 7 9 11 13 and 14 all point at the code for the Otherwise clause so if I contains one of these values the Otherwise clause will be executed.

There is a problem with this implementation of the case statement. If the case labels contain non-consecutive entries that are widely spaced the following case statement would generate an extremely large code file:

CASE I OF
0: stmt1;
100: stmt2;
1000: stmt3;
10000: stmt4;
Otherwise stmt5
END;

In this situation your program will be much smaller if you implement the case statement with a sequence of if statements rather than using a jump statement. However keep one thing in mind- the size of the jump table does not normally affect the execution speed of the program. If the jump table contains two entries or two thousand the case statement will execute the multi-way branch in a constant amount of time. The if statement implementation requires a linearly increasing amount of time for each case label appearing in the case statement.

Probably the biggest advantage to using assembly language over a HLL like Pascal is that you get to choose the actual implementation. In some instances you can implement a case statement as a sequence ofif..then..else statements or you can implement it as a jump table or you can use a hybrid of the two:

CASE I OF
0:stmt1;
1:stmt2;
2:stmt3;
100:stmt4;
Otherwise stmt5
END;

could become:

mov     bx
I
cmp     bx
100
je      Is100
cmp     bx
2
ja      Otherwise
shl     bx
1
jmp     cs:JmpTbl[bx]
etc.

Of course you could do this in Pascal with the following code:

IF I = 100 then stmt4
ELSE CASE I OF
0:stmt1;
1:stmt2;
2:stmt3;
Otherwise stmt5
END;

But this tends to destroy the readability of the Pascal program. On the other hand the extra code to test for 100 in the assembly language code doesn't adversely affect the readability of the program (perhaps because it's so hard to read already). Therefore most people will add the extra code to make their program more efficient.

The C/C++ switch statement is very similar to the Pascal case statement. There is only one major semantic difference: the programmer must explicitly place a break statement in each case clause to transfer control to the first statement beyond the switch. This break corresponds to the jmp instruction at the end of each case sequence in the assembly code above. If the corresponding break is not present C/C++ transfers control into the code of the following case. This is equivalent to leaving off the jmp at the end of the case's sequence:

switch (i)
{
case 0: stmt1;
case 1: stmt2;
case 2: stmt3;
break;
case 3: stmt4;
break;
default:stmt5;
}

This translates into the following 80x86 code:

mov     bx
i
cmp     bx
3
ja      DefaultCase

shl     bx
1
jmp     cs:JmpTbl[bx]
JmpTbl          word    case0
case1
case2
case3

case0:          <stmt1's code>

case1:          <stmt2's code>

case2:          <stmt3's code>

jmp     EndCase         ;Emitted for the break stmt.

case3:          <stmt4's code>
jmp     EndCase         ;Emitted for the break stmt.

DefaultCase:    <stmt5's code>
EndCase:
 10.4 State Machines and Indirect Jumps

Another control structure commonly found in assembly language programs is the state machine. A state machine uses a state variable to control program flow. The FORTRAN programming language provides this capability with the assigned goto statement. Certain variants of C (e.g. GNU's GCC from the Free Software Foundation) provide similar features. In assembly language the indirect jump provides a mechanism to easily implement state machines.

So what is a state machine? In very basic terms it is a piece of code[2] which keeps track of its execution history by entering and leaving certain "states". For the purposes of this chapter we'll not use a very formal definition of a state machine. We'll just assume that a state machine is a piece of code which (somehow) remembers the history of its execution (its state) and executes sections of code based upon that history.

In a very real sense all programs are state machines. The CPU registers and values in memory constitute the "state" of that machine. However we'll use a much more constrained view. Indeed for most purposes only a single variable (or the value in the IP register) will denote the current state.

Now let's consider a concrete example. Suppose you have a procedure which you want to perform one operation the first time you call it a different operation the second time you call it yet something else the third time you call it and then something new again on the fourth call. After the fourth call it repeats these four different operations in order. For example suppose you want the procedure to add ax and bx the first time subtract them on the second call multiply them on the third and divide them on the fourth. You could implement this procedure as follows:

State           byte    0
StateMach       proc
cmp     state
0
jne     TryState1

; If this is state 0
add BX to AX and switch to state 1:

bx
inc     State           ;Set it to state 1
ret

; If this is state 1
subtract BX from AX and switch to state 2

TryState1:      cmp     State
1
jne     TryState2
sub     ax
bx
inc     State
ret

; If this is state 2
multiply AX and BX and switch to state 3:

TryState2:      cmp     State
2
jne     MustBeState3
push    dx
mul     bx
pop     dx
inc     State
ret

; If none of the above
assume we're in State 4. So divide
; AX by BX.

MustBeState3:
push    dx
xor     dx
dx          ;Zero extend AX into DX.
div     bx
pop     dx
mov     State
0        ;Switch back to State 0
ret
StateMach       endp

Technically this procedure is not the state machine. Instead it is the variable State and the cmp/jne instructions which constitute the state machine.

There is nothing particularly special about this code. It's little more than a case statement implemented via theif..then..else construct. The only thing special about this procedure is that it remembers how many times it has been called[3] and behaves differently depending upon the number of calls. While this is a correct implementation of the desired state machine it is not particularly efficient. The more common implementation of a state machine in assembly language is to use an indirect jump. Rather than having a state variable which contains a value like zero one two or three we could load the state variable with the address of the code to execute upon entry into the procedure. By simply jumping to that address the state machine could save the tests above needed to execute the proper code fragment. Consider the following implementation using the indirect jump:

State           word    State0
StateMach       proc
jmp     State

; If this is state 0
add BX to AX and switch to state 1:

bx
mov     State
offset State1    ;Set it to state 1
ret

; If this is state 1
subtract BX from AX and switch to state 2

State1:         sub     ax
bx
mov     State
offset State2    ;Switch to State 2
ret

; If this is state 2
multiply AX and BX and switch to state 3:

State2:         push    dx
mul     bx
pop     dx
mov     State
offset State3    ;Switch to State 3
ret

; If in State 3
do the division and switch back to State 0:

State3:         push    dx
xor     dx
dx                  ;Zero extend AX into DX.
div     bx
pop     dx
mov     State
offset State0    ;Switch to State 0
ret
StateMach       endp

The jmp instruction at the beginning of the StateMach procedure transfers control to the location pointed at by the State variable. The first time you call StateMach it points at the State0 label. Thereafter each subsection of code sets the State variable to point at the appropriate successor code.

 10.5 Spaghetti Code

One major problem with assembly language is that it takes several statements to realize a simple idea encapsulated by a single HLL statement. All too often an assembly language programmer will notice that s/he can save a few bytes or cycles by jumping into the middle of some programming structure. After a few such observations (and corresponding modifications) the code contains a whole sequence of jumps in and out of portions of the code. If you were to draw a line from each jump to its destination the resulting listing would end up looking like someone dumped a bowl of spaghetti on your code hence the term "spaghetti code".

Spaghetti code suffers from one major drawback- it's difficult (at best) to read such a program and figure out what it does. Most programs start out in a "structured" form only to become spaghetti code at the altar of efficiency. Alas spaghetti code is rarely efficient. Since it's difficult to figure out exactly what's going on it's very difficult to determine if you can use a better algorithm to improve the system. Hence spaghetti code may wind up less efficient.

While it's true that producing some spaghetti code in your programs may improve its efficiency doing so should always be a last resort (when you've tried everything else and you still haven't achieved what you need) never a matter of course. Always start out writing your programs with straight-forward ifs and case statements. Start combining sections of code (via jmp instructions) once everything is working and well understood. Of course you should never obliterate the structure of your code unless the gains are worth it.

A famous saying in structured programming circles is "After gotos pointers are the next most dangerous element in a programming language." A similar saying is "Pointers are to data structures what gotos are to control structures." In other words avoid excessive use of pointers. If pointers and gotos are bad then the indirect jump must be the worst construct of all since it involves both gotos and pointers! Seriously though the indirect jump instructions should be avoided for casual use. They tend to make a program harder to read. After all an indirect jump can (theoretically) transfer control to any label within a program. Imagine how hard it would be to follow the flow through a program if you have no idea what a pointer contains and you come across an indirect jump using that pointer. Therefore you should always exercise care when using jump indirect instructions.

[1] Versions of Turbo Pascal sadly treat the case statement as a form of the if..then..else statement.

[2] Note that state machines need not be software based. Many state machines' implementation are hardware based.

[3] Actually it remembers how many times MOD 4 that it has been called.

 Chapter Ten (Part 1) Table of Content Chapter Ten (Part 3)

Chapter Ten: Control Structures (Part 2)
27 SEP 1996