The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Fifteen (Part 3)

Table of Content

Chapter Fifteen (Part 5) 

CHAPTER FIFTEEN:
STRINGS AND CHARACTER SETS (Part 4)
15.3 - Character String Functions
15.3.1 - Substr
15.3.2 - Index
15.3.3 - Repeat
15.3.4 - Insert
15.3.5 - Delete
15.3.6 - Concatenation
15.3 Character String Functions

Most high level languages like Pascal BASIC "C" and PL/I provide several string functions and procedures (either built into the language or as part of a standard library). Other than the five string operations provided above the 80x86 doesn't support any string functions. Therefore if you need a particular string function you'll have to write it yourself. The following sections describe many of the more popular string functions and how to implement them in assembly language.

15.3.1 Substr

The Substr (substring) function copies a portion of one string to another. In a high level language this function usually takes the form:

	DestStr := Substr(SrcStr
Index
Length);

where:

The following examples show how Substr works.

		SrcStr := 'This is an example of a string';
DestStr := Substr(SrcStr
11
7);
write(DestStr);

This prints 'example'. The index value is eleven so the Substr function will begin copying data starting at the eleventh character in the string. The eleventh character is the 'e' in 'example'. The length of the string is seven.

This invocation copies the seven characters 'example' to DestStr.

		SrcStr := 'This is an example of a string';
DestStr := Substr(SrcStr
1
10);
write(DestStr);

This prints 'This is an'. Since the index is one this occurrence of the Substr function starts copying 10 characters starting with the first character in the string.

		SrcStr := 'This is an example of a string';
DestStr := Substr(SrcStr
20
11);
write(DestStr);

This prints 'of a string'. This call to Substr extracts the last eleven characters in the string.

What happens if the index and length values are out of bounds? For example what happens if Index is zero or is greater than the length of the string? What happens if Index is fine but the sum of Index and Length is greater than the length of the source string? You can handle these abnormal situations in one of three ways: (1)ignore the possibility of error; (2)abort the program with a run-time error; (3)process some reasonable number of characters in response to the request.

The first solution operates under the assumption that the caller never makes a mistake computing the values for the parameters to the Substr function. It blindly assumes that the values passed to the Substr function are correct and processes the string based on that assumption. This can produce some bizarre effects. Consider the following examples which use length-prefixed strings:

		SourceStr :='1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ';
DestStr := Substr(SourceStr
0
5);
Write('DestStr');

prints '$1234'. The reason of course is that SourceStr is a length-prefixed string. Therefore the length 36 appears at offset zero within the string. If Substr uses the illegal index of zero then the length of the string will be returned as the first character. In this particular case the length of the string 36 just happened to correspond to the ASCII code for the '$' character.

The situation is considerably worse if the value specified for Index is negative or is greater than the length of the string. In such a case the Substr function would be returning a substring containing characters appearing before or after the source string. This is not a reasonable result.

Despite the problems with ignoring the possibility of error in the Substr function there is one big advantage to processing substrings in this manner: the resulting Substr code is more efficient if it doesn't have to perform any run-time checking on the data. If you know that the index and length values are always within an acceptable range then there is no need to do this checking within Substr function. If you can guarantee that an error will not occur your programs will run (somewhat) faster by eliminating the run-time check.

Since most programs are rarely error-free you're taking a big gamble if you assume that all calls to the Substr routine are passing reasonable values. Therefore some sort of run-time check is often necessary to catch errors in your program. An error occurs under the following conditions:

An alternative to ignoring any of these errors is to abort with an error message. This is probably fine during the program development phase but once your program is in the hands of users it could be a real disaster. Your customers wouldn't be very happy if they'd spent all day entering data into a program and it aborted causing them to lose the data they've entered. An alternative to aborting when an error occurs is to have the Substr function return an error condition. Then leave it up to the calling code to determine if an error has occurred. This technique works well with the third alternative to handling errors: processing the substring as best you can.

The third alternative handling the error as best you can is probably the best alternative. Handle the error conditions in the following manner:

The following code for the Substr function expects four parameters: the addresses of the source and destination strings the starting index and the length of the desired substring. Substr expects the parameters in the following registers:

ds:si- The address of the source string.

es:di- The address of the destination string.

ch- The starting index.

cl- The length of the substring.

Substr returns the following values:

If an error occurs then the calling code must examine the values in si di and cx to determine the exact cause of the error (if this is necessary). In the event of an error the Substr function returns the following substrings:

; Substring function.
;
; HLL form:
;
;procedure substring(var Src:string;
;                       Index
Length:integer;
;                       var Dest:string);
;
; Src- Address of a source string.
; Index- Index into the source string.
; Length- Length of the substring to extract.
; Dest- Address of a destination string.
;
; Copies the source string from address [Src+index] of length
; Length to the destination string.
;
; If an error occurs
the carry flag is returned set
otherwise
; clear.
;
; Parameters are passed as follows:
;
; DS:SI- Source string address.
; ES:DI- Destination string address.
; CH- Index into source string.
; CL- Length of source string.
;
; Note: the strings pointed at by the SI and DI registers are
; length-prefixed strings. That is
the first byte of each
; string contains the length of that string.

Substring       proc    near
push    ax
push    cx
push    di
push    si
clc                     ;Assume no error.
pushf                   ;Save direction flag status.

; Check the validity of the parameters.

cmp     ch
[si]        ;Is index beyond the length of
ja      ReturnEmpty     ; the source string?
mov     al
ch          ;See if the sum of index and
dec     al              ; length is beyond the end of the
add     al
cl          ; string.
jc      TooLong         ;Error if > 255.
cmp     al
[si]        ;Beyond the length of the source?
jbe     OkaySoFar

; If the substring isn't completely contained within the source
; string
truncate it:

TooLong:        popf
stc                     ;Return an error flag.
pushf
mov     al
[si]                ;Get maximum length.
sub     al
ch          ;Subtract index value.
inc     al              ;Adjust as appropriate.
mov     cl
al          ;Save as new length.

OkaySoFar:      mov     es:[di]
cl             ;Save destination string length.
inc     di
mov     al
ch          ;Get index into source.
mov     ch
0           ;Zero extend length value into CX.
mov     ah
0           ;Zero extend index into AX.
add     si
ax          ;Compute address of substring.
cld
rep     movsb                   ;Copy the substring.

popf
SubStrDone:     pop     si
pop     di
pop     cx
pop     ax
ret

; Return an empty string here:

ReturnEmpty:    mov     byte ptr es:[di]
0
popf
stc
jmp     SubStrDone
SubString       endp

15.3.2 Index

The Index string function searches for the first occurrence of one string within another and returns the offset to that occurrence. Consider the following HLL form:

	SourceStr := 'Hello world';
TestStr := 'world';
I := INDEX(SourceStr
TestStr);

The Index function scans through the source string looking for the first occurrence of the test string. If found it returns the index into the source string where the test string begins. In the example above the Index function would return seven since the substring 'world' starts at the seventh character position in the source string.

The only possible error occurs if Index cannot find the test string in the source string. In such a situation most implementations return zero. Our version will do likewise. The Index function which follows operates in the following fashion:

1) It compares the length of the test string to the length of the source string. If the test string is longer Index immediately returns zero since there is no way the test string will be found in the source string in this situation.

2) The index function operates as follows:

	i := 1;
while (i < (length(source)-length(test)) and
test <> substr(source
i
length(test)) do
i := i+1;

When this loop terminates if (i < length(source)-length(test)) then it contains the index into source where test begins. Otherwise test is not a substring of source. Using the previous example this loop compares test to source in the following manner:

i=1
test:           world           No match
source:         Hello world

i=2
test:            world          No match
source:         Hello world

i=3
test:             world         No match
source:         Hello world

i=4
test:              world        No match
source:         Hello world

i=5
test:               world       No match
source:         Hello world

i=6
test:                world      No match
source:         Hello world

i=7
test:                 world     Match
source:         Hello world

There are (algorithmically) better ways to do this comparison[7] however the algorithm above lends itself to the use of 80x86 string instructions and is very easy to understand. Index's code follows:

; INDEX- computes the offset of one string within another.
;
; On entry:
;
; ES:DI-                        Points at the test string that INDEX will search for
;                       in the source string.
; DS:SI-                        Points at the source string which (presumably)
;                        contains the string INDEX is searching for.
;
; On exit:
;
; AX-                   Contains the offset into the source string where the
;                       test string was found.

INDEX           proc    near
push    si
push    di
push    bx
push    cx
pushf                   ;Save direction flag value.
cld

mov     al
es:[di]     ;Get the length of the test string.
cmp     al
[si]        ;See if it is longer than the length
ja      NotThere        ; of the source string.

; Compute the index of the last character we need to compare the
; test string against in the source string.

mov     al
es:[di]     ;Length of test string.
mov     cl
al          ;Save for later.
mov     ch
0
sub     al
[si]        ;Length of source string.
mov     bl
al          ;# of times to repeat loop.
inc     di              ;Skip over length byte.
xor     ax
ax          ;Init index to zero.
CmpLoop:        inc     ax              ;Bump index by one.
inc     si              ;Move on to the next char in source.
push    si              ;Save string pointers and the
push    di              ; length of the test string.
push    cx
rep     cmpsb                   ;Compare the strings.
pop     cx              ;Restore string pointers
pop     di              ; and length.
pop     si
je      Foundindex      ;If we found the substring.
dec     bl
jnz     CmpLoop         ;Try next entry in source string.

; If we fall down here
the test string doesn't appear inside the
; source string.

NotThere:       xor     ax
ax          ;Return INDEX = 0

; If the substring was found in the loop above
remove the
; garbage left on the stack

FoundIndex:             popf
pop     cx
pop     bx
pop     di
pop     si
ret
INDEX           endp

15.3.3 Repeat

The Repeat string function expects three parameters- the address of a string a length and a character. It constructs a string of the specified length containing "length" copies of the specified character. For example Repeat(STR 5 '*') stores the string '*****' into the STR string variable. This is a very easy string function to write thanks to the stosb instruction:

; REPEAT-       Constructs a string of length CX where each element
;               is initialized to the character passed in AL.
;
; On entry:
;
; ES:DI-        Points at the string to be constructed.
; CX-           Contains the length of the string.
; AL-           Contains the character with which each element of
;               the string is to be initialized.

REPEAT          proc    near
push    di
push    ax
push    cx
pushf                   ;Save direction flag value.
cld
mov     es:[di]
cl     ;Save string length.
mov     ch
0           ;Just in case.
inc     di              ;Start string at next location.
rep     stosb
popf
pop     cx
pop     ax
pop     di
ret
REPEAT          endp

15.3.4 Insert

The Insert string function inserts one string into another. It expects three parameters a source string a destination string and an index. Insert inserts the source string into the destination string starting at the offset specified by the index parameter. HLLs usually call the Insert procedure as follows:

		source := ' there';
dest := 'Hello world';
INSERT(source
dest
6);

The call to Insert above would change source to contain the string 'Hello there world'. It does this by inserting the string ' there' before the sixth character in 'Hello world'.

The insert procedure using the following algorithm:

Insert(Src dest index);

1) Move the characters from location dest+index through the end of the destination string length (Src) bytes up in memory.

2) Copy the characters from the Src string to location dest+index.

3) Adjust the length of the destination string so that it is the sum of the destination and source lengths. The following code implements this algorithm:

; INSERT- Inserts one string into another.
;
; On entry:
;
; DS:SI Points at the source string to be inserted
;
; ES:DI Points at the destination string into which the source
; string will be inserted.
;
; DX Contains the offset into the destination string where the
; source string is to be inserted.
;
;
; All registers are preserved.
;
; Error condition-
;
; If the length of the newly created string is greater than 255

; the insert operation will not be performed and the carry flag
; will be returned set.
;
; If the index is greater than the length of the destination
; string

; then the source string will be appended to the end of the destin- ; ation string.

INSERT          proc    near
push    si
push    di
push    dx
push    cx
push    bx
push    ax
clc                     ;Assume no error.
pushf
mov     dh
0           ;Just to be safe.

; First
see if the new string will be too long.

mov     ch
0
mov     ah
ch
mov     bh
ch
mov     al
es:[di]     ;AX = length of dest string.
mov     cl
[si]        ;CX = length of source string.
mov     bl
al          ;BX = length of new string.
add     bl
cl
jc      TooLong         ;Abort if too long.
mov     es:[di]
bl     ;Update length.

; See if the index value is too large:

cmp     dl
al
jbe     IndexIsOK
mov     dl
al
IndexIsOK:

; Now
make room for the string that's about to be inserted.

push    si              ;Save for later.
push    cx

mov     si
di          ;Point SI at the end of current
add     si
ax          ; destination string.
add     di
bx          ;Point DI at the end of new str.
std
rep     movsb                   ;Open up space for new string.

; Now
copy the source string into the space opened up.

pop     cx
pop     si
add     si
cx          ;Point at end of source string.
rep     movsb
jmp     INSERTDone

TooLong:                popf
stc
pushf

INSERTDone:             popf
pop     ax
pop     bx
pop     cx
pop     dx
pop     di
pop     si
ret
INSERT          endp

15.3.5 Delete

The Delete string removes characters from a string. It expects three parameters - the address of a string an index into that string and the number of characters to remove from that string. A HLL call to Delete usually takes the form:

	 	Delete(Str
index
length);

For example

		Str := 'Hello there world';
Delete(str
7
6);

This call to Delete will leave str containing 'Hello world'. The algorithm for the delete operation is the following:

1) Subtract the length parameter value from the length of the destination string and update the length of the destination string with this new value.

2) Copy any characters following the deleted substring over the top of the deleted substring.

There are a couple of errors that may occur when using the delete procedure. The index value could be zero or larger than the size of the specified string. In this case the Delete procedure shouldn't do anything to the string. If the sum of the index and length parameters is greater than the length of the string then the Delete procedure should delete all the characters to the end of the string. The following code implements the Delete procedure:

; DELETE - removes some substring from a string.
;
; On entry:
;
; DS:SI                 Points at the source string.
; DX                    Index into the string of the start of the substring
;                       to delete.
; CX                    Length of the substring to be deleted.
;
; Error conditions-
;
; If DX is greater than the length of the string
then the
; operation is aborted.
;
; If DX+CX is greater than the length of the string
DELETE only
; deletes those characters from DX through the end of the string.

DELETE          proc    near
push    es
push    si
push    di
push    ax
push    cx
push    dx
pushf                   ;Save direction flag.
mov     ax
ds          ;Source and destination strings
mov     es
ax          ; are the same.
mov     ah
0
mov     dh
ah          ;Just to be safe.
mov     ch
ah

; See if any error conditions exist.

mov     al
[si]        ;Get the string length
cmp     dl
al          ;Is the index too big?
ja      TooBig
mov     al
dl          ;Now see if INDEX+LENGTH
add     al
cl          ;is too large
jc      Truncate
cmp     al
[si]
jbe     LengthIsOK

; If the substring is too big
truncate it to fit.

Truncate:       mov     cl
[si]                ;Compute maximum length
sub     cl
dl
inc     cl

; Compute the length of the new string.

LengthIsOK:     mov     al
[si]
sub     al
cl
mov     [si]
al

; Okay
now delete the specified substring.

add     si
dx          ;Compute address of the substring
mov     di
si          ; to be deleted
and the address of
add     di
cx          ; the first character following it.
cld
rep     movsb                   ;Delete the string.

TooBig:         popf
pop     dx
pop     cx
pop     ax
pop     di
pop     si
pop     es
ret
DELETE          endp

15.3.6 Concatenation

The concatenation operation takes two strings and appends one to the end of the other. For example Concat('Hello ' 'world') produces the string 'Hello world'. Some high level languages treat concatenation as a function call others as a procedure call. Since in assembly language everything is a procedure call anyway we'll adopt the procedural syntax. Our Concat procedure will take the following form:

 	Concat(source1
source2
dest);

This procedure will copy source1 to dest then it will concatenate source2 to the end of dest. Concat follows:

; Concat-       Copies the string pointed at by SI to the string
;               rointed at by DI and then concatenates the string;
;               pointed at by BX to the destination string.
;
; On entry-
;
; DS:SI-        Points at the first source string
; DS:BX-        Points at the second source string
; ES:DI-        Points at the destination string.
;
; Error condition-
;
; The sum of the lengths of the two strings is greater than 255.
; In this event
the second string will be truncated so that the
; entire string is less than 256 characters in length.

CONCAT          proc    near
push    si
push    di
push    cx
push    ax
pushf

; Copy the first string to the destination string:

mov     al
[si]
mov     cl
al
mov     ch
0
mov     ah
ch
add     al
[bx]        ;Compute the sum of the string's
adc     ah
0           ; lengths.
cmp     ax
256
jb      SetNewLength
mov     ah
[si]        ;Save original string length.
mov     al
255         ;Fix string length at 255.
SetNewLength:   mov     es:[di]
al     ;Save new string length.
inc     di              ;Skip over length bytes.
inc     si
rep     movsb                   ;Copy source1 to dest string.

; If the sum of the two strings is too long
the second string
; must be truncated.

mov     cl
[bx]        ;Get length of second string.
cmp     ax
256
jb      LengthsAreOK
mov     cl
ah          ;Compute truncated length.
neg     cl              ;CL := 256-Length(Str1).

LengthsAreOK:   lea     si
1[bx]       ;Point at second string and
;                                       ; skip the string length.
cld
rep     movsb           ;Perform the concatenation.

popf
pop     ax
pop     cx
pop     di
pop     si
ret
CONCAT          endp

[7] The interested reader should look up the Knuth-Morris-Pratt algorithm in "Data Structure Techniques" by Thomas A. Standish. The Boyer-Moore algorithm is another fast string search routine although somewhat more complex.

Chapter Fifteen (Part 3)

Table of Content

Chapter Fifteen (Part 5) 

Chapter Fifteen: Strings And Character Sets (Part 4)
28 SEP 1996