The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Fifteen (Part 4)

Table of Content

Chapter Fifteen (Part 6) 

CHAPTER FIFTEEN:
STRINGS AND CHARACTER SETS (Part 5)
15.4 - String Functions in the UCR Standard Library
15.4.1 - StrBDel StrBDelm
15.4.2 - Strcat Strcatl Strcatm Strcatml
15.4.3 - Strchr
15.4.4 - Strcmp Strcmpl Stricmp Stricmpl
15.4.5 - Strcpy Strcpyl Strdup Strdupl
15.4.6 - Strdel Strdelm
15.4.7 - Strins Strinsl Strinsm Strinsml
15.4.8 - Strlen
15.4.9 - Strlwr Strlwrm Strupr Struprm
15.4.10 - Strrev Strrevm
15.4.11 - Strset Strsetm
15.4.12 - Strspan Strspanl Strcspan Strcspanl
15.4.13 - Strstr Strstrl
15.4.14 - Strtrim Strtrimm
15.4.15 - Other String Routines in the UCR Standard Library
15.4 String Functions in the UCR Standard Library

The UCR Standard Library for 80x86 Assembly Language Programmers provides a very rich set of string functions you may use. These routines for the most part are quite similar to the string functions provided in the C Standard Library. As such these functions support zero terminated strings rather than the length prefixed strings supported by the functions in the previous sections.

Because there are so many different UCR StdLib string routines and the sources for all these routines are in the public domain (and are present on the companion CD-ROM for this text) the following sections will not discuss the implementation of each routine. Instead the following sections will concentrate on how to use these library routines.

The UCR library often provides several variants of the same routine. Generally a suffix of "l" "m" or "ml" appears at the end of the name of these variant routines. The "l" suffix stands for "literal constant". Routines with the "l" (or "ml") suffix require two string operands. The first is generally pointed at by es:di and the second immediate follows the call in the code stream.

Most StdLib string routines operate on the specified string (or one of the strings if the function has two operands). The "m" (or "ml") suffix instructs the string function to allocate storage on the heap (using malloc hence the "m" suffix) for the new string and store the modified result there rather than changing the source string(s). These routines always return a pointer to the newly created string in the es:di registers. In the event of a memory allocation error (insufficient memory) these routines with the "m" or "ml" suffix return the carry flag set. They return the carry clear if the operation was successful.

15.4.1 StrBDel StrBDelm

These two routines delete leading spaces from a string. StrBDel removes any leading spaces from the string pointed at by es:di. It actually modifies the source string. StrBDelm makes a copy of the string on the heap with any leading spaces removed. If there are no leading spaces then the StrBDel routines return the original string without modification. Note that these routines only affect leading spaces (those appearing at the beginning of the string). They do not remove trailing spaces and spaces in the middle of the string. See Strtrim if you want to remove trailing spaces. Examples:

MyString        byte    "    Hello there
this is my string"
0
MyStrPtr        dword   MyString
.
.
.
les     di
MyStrPtr
strbdelm            ;Creates a new string w/o leading spaces

jc      error       ; pointer to string is in ES:DI on return.
puts                ;Print the string pointed at by ES:DI.
free                ;Deallocate storage allocated by strbdelm.
.
.
.
; Note that "MyString" still contains the leading spaces.
; The following printf call will print the string along with
; those leading spaces. "strbdelm" above did not change MyString.

printf
byte    "MyString = '%s'\n"
0
dword   MyString
.
.
.
les     di
MyStrPtr
strbdel

; Now
we really have removed the leading spaces from "MyString"

printf
byte    "MyString = '%s'\n"
0
dword   MyString
.
.
.

Output from this code fragment:

Hello there
this is my string
MyString = '   Hello there
this is my string'
MyString = 'Hello there
this is my string'

15.4.2 Strcat Strcatl Strcatm Strcatml

The strcat(xx) routines perform string concatenation. On entry es:di points at the first string and for strcat/strcatm dx:si points at the second string. For strcatl and strcatlm the second string follows the call in the code stream. These routines create a new string by appending the second string to the end of the first. In the case of strcat and strcatl the second string is directly appended to the end of the first string (es:di) in memory. You must make sure there is sufficient memory at the end of the first string to hold the appended characters. Strcatm and strcatml create a new string on the heap (using malloc) holding the concatenated result. Examples:

String1         byte    "Hello "
0
byte    16 dup (0)              ;Room for concatenation.

String2         byte    "world"
0

; The following macro loads ES:DI with the address of the
; specified operand.

lesi            macro   operand
mov     di
seg operand
mov     es
di
mov     di
offset operand
endm

; The following macro loads DX:SI with the address of the
; specified operand.

ldxi            macro   operand
mov     dx
seg operand
mov     si
offset operand
endm
.
.
.
lesi    String1
ldxi    String2
strcatm                 ;Create "Hello world"
jc      error           ;If insufficient memory.
print
byte    "strcatm: "
0
puts                    ;Print "Hello world"
putcr
free                    ;Deallocate string storage.
.
.
.
lesi    String1         ;Create the string
strcatml                        ; "Hello there"
jc      error           ;If insufficient memory.
byte    "there"
0
print
byte    "strcatml: "
0
puts                    ;Print "Hello there"
putcr
free
.
.
.
lesi    String1
ldxi    String2
strcat                  ;Create "Hello world"
printf
byte    "strcat: %s\n"
0
.
.
.
; Note: since strcat above has actually modified String1

; the following call to strcatl appends "there" to the end
; of the string "Hello world".

lesi    String1
strcatl
byte    "there"
0
printf
byte    "strcatl: %s\n"
0
.
.
.

The code above produces the following output:

strcatm: Hello world
strcatml: Hello there
strcat: Hello world
strcatl: Hello world there

15.4.3 Strchr

Strchr searches for the first occurrence of a single character within a string. In operation it is quite similar to the scasb instruction. However you do not have to specify an explicit length when using this function as you would for scasb.

On entry es:di points at the string you want to search through al contains the value to search for. On return the carry flag denotes success (C=1 means the character was not present in the string C=0 means the character was present). If the character was found in the string cx contains the index into the string where strchr located the character. Note that the first character of the string is at index zero. So strchr will return zero if al matches the first character of the string. If the carry flag is set then the value in cx has no meaning. Example:

; Note that the following string has a period at location
; "HasPeriod+24".

HasPeriod       byte    "This string has a period."
0
.
.
.
lesi    HasPeriod       ;See strcat for lesi definition.
mov     al
"."         ;Search for a period.
strchr
jnc     GotPeriod
print
byte    "No period in string"
cr
lf
0
jmp     Done

; If we found the period
output the offset into the string:

GotPeriod:      print
byte    "Found period at offset "
0
mov     ax
cx
puti
putcr
Done:

This code fragment produces the output:

Found period at offset 24

15.4.4 Strcmp Strcmpl Stricmp Stricmpl

These routines compare strings using a lexicographical ordering. On entry to strcmp or stricmp es:di points at the first string and dx:si points at the second string. Strcmp compares the first string to the second and returns the result of the comparison in the flags register. Strcmpl operates in a similar fashion except the second string follows the call in the code stream. The stricmp and stricmpl routines differ from their counterparts in that they ignore case during the comparison. Whereas strcmp would return 'not equal' when comparing "Strcmp" with "strcmp" the stricmp (and stricmpl) routines would return "equal" since the only differences are upper vs. lower case. The "i" in stricmp and stricmpl stands for "ignore case." Examples:

String1         byte    "Hello world"
0
String2         byte    "hello world"
0
String3         byte    "Hello there"
0
.
.
.
lesi    String1         ;See strcat for lesi definition.
ldxi    String2         ;See strcat for ldxi definition.
strcmp
jae     IsGtrEql
printf
byte    "%s is less than %s\n"
0
dword   String1
String2
jmp     Tryl

IsGtrEql:               printf
byte    "%s is greater or equal to %s\n"
0
dword   String1
String2

Tryl:           lesi    String2
strcmpl
byte    "hi world!"
0
jne     NotEql
printf
byte    "Hmmm...
%s is equal to 'hi world!'\n"
0
dword   String2
jmp     Tryi

NotEql:         printf
byte    "%s is not equal to 'hi world!'\n"
0
dword   String2

Tryi:           lesi    String1
ldxi    String2
stricmp
jne     BadCmp
printf
byte    "Ignoring case
%s equals %s\n"
0
dword   String1
String2
jmp     Tryil

BadCmp:         printf
byte    "Wow
stricmp doesn't work! %s <> %s\n"
0
dword   String1
String2

Tryil:          lesi    String2
stricmpl
byte    "hELLO THERE"
0
jne     BadCmp2
print
byte    "Stricmpl worked"
cr
lf
0
jmp     Done

BadCmp2:        print
byte    "Stricmp did not work"
cr
lf
0

Done:

15.4.5 Strcpy Strcpyl Strdup Strdupl

The strcpy and strdup routines copy one string to another. There is no strcpym or strcpyml routines. Strdup and strdupl correspond to those operations. The UCR Standard Library uses the names strdup and strdupl rather than strcpym and strcpyml so it will use the same names as the C standard library.

Strcpy copies the string pointed at by es:di to the memory locations beginning at the address in dx:si. There is no error checking; you must ensure that there is sufficient free space at location dx:si before calling strcpy. Strcpy returns with es:di pointing at the destination string (that is the original dx:si value). Strcpyl works in a similar fashion except the source string follows the call.

Strdup duplicates the string which es:di points at and returns a pointer to the new string on the heap. Strdupl works in a similar fashion except the string follows the call. As usual the carry flag is set if there is a memory allocation error when using strdup or strdupl. Examples:

String1         byte            "Copy this string"
0
String2         byte            32 dup (0)
String3         byte            32 dup (0)
StrVar1         dword           0
StrVar2         dword           0
.
.
.
lesi    String1         ;See strcat for lesi definition.
ldxi    String2         ;See strcat for ldxi definition.
strcpy

ldxi    String3
strcpyl
byte    "This string
too!"
0

lesi    String1
strdup
jc      error                   ;If insufficient mem.
mov     word ptr StrVar1
di    ;Save away ptr to
mov     word ptr StrVar1+2
es  ; string.

strdupl
jc      error
byte    "Also
this string"
0
mov     word ptr StrVar2
di
mov     word ptr StrVar2+2
es

printf
byte    "strcpy: %s\n"
byte    "strcpyl: %s\n"
byte    "strdup: %^s\n"
byte    "strdupl: %^s\n"
0
dword   String2
String3
StrVar1
StrVar2

15.4.6 Strdel Strdelm

Strdel and strdelm delete characters from a string. Strdel deletes the specified characters within the string strdelm creates a new copy of the source string without the specified characters. On entry es:di points at the string to manipulate cx contains the index into the string where the deletion is to start and ax contains the number of characters to delete from the string. On return es:di points at the new string (which is on the heap if you call strdelm). For strdelm only if the carry flag is set on return there was a memory allocation error. As with all UCR StdLib string routines the index values for the string are zero-based. That is zero is the index of the first character in the source string. Example:

String1         byte    "Hello there
how are you?"
0
.
.
.
lesi    String1         ;See strcat for lesi definition.
mov     cx
5           ;Start at position five (" there")
mov     ax
6           ;Delete six characters.
strdelm                 ;Create a new string.
jc      error           ;If insufficient memory.
print
byte    "New string:"
0
puts
putcr

lesi    String1
mov     ax
11
mov     cx
13
strdel
printf
byte    "Modified string: %s\n"
0
dword   String1

This code prints the following:

New string: Hello how are you?
Modified string: Hello there

15.4.7 Strins Strinsl Strinsm Strinsml

The strins(xx) functions insert one string within another. For all four routines es:di points at the source string into you want to insert another string. Cx contains the insertion point (0..length of source string). For strins and strinsm dx:si points at the string you wish to insert. For strinsl and strinsml the string to insert appears as a literal constant in the code stream. Strins and strinsl insert the second string directly into the string pointed at by es:di. Strinsm and strinsml make a copy of the source string and insert the second string into that copy. They return a pointer to the new string in es:di. If there is a memory allocation error then strinsm/strinsml sets the carry flag on return. For strins and strinsl the first string must have sufficient storage allocated to hold the new string. Examples:

InsertInMe      byte    "Insert >< Here"
0
byte    16 dup (0)
InsertStr       byte    "insert this"
0
StrPtr1         dword   0
StrPtr2         dword   0
.
.
.
lesi    InsertInMe      ;See strcat for lesi definition.
ldxi    InsertStr       ;See strcat for ldxi definition.
mov     cx
8           ;Īnsert before "<"
strinsm
mov     word ptr StrPtr1
di
mov     word ptr StrPtr1+2
es

lesi    InsertInMe
mov     cx
8
strinsml
byte    "insert that"
0
mov     word ptr StrPtr2
di
mov     word ptr StrPtr2+2
es

lesi    InsertInMe
mov     cx
8
strinsl
byte    " "
0           ;Two spaces

lesi    InsertInMe
ldxi    InsertStr
mov     cx
9           ;In front of first space from above.
strins

printf
byte    "First string: %^s\n"
byte    "Second string: %^s\n"
byte    "Third string: %s\n"
0
dword   StrPtr1
StrPtr2
InsertInMe

Note that the strins and strinsl operations above both insert strings into the same destination string. The output from the above code is

First string: Insert >insert this< here
Second string: Insert >insert that< here
Third string: Insert > insert this < here

15.4.8 Strlen

Strlen computes the length of the string pointed at by es:di. It returns the number of characters up to but not including the zero terminating byte. It returns this length in the cx register. Example:

GetLen          byte    "This string is 33 characters long"
0
.
.
.
lesi    GetLen          ;See strcat for lesi definition.
strlen
print
byte    "The string is "
0
mov     ax
cx          ;Puti needs the length in AX!
puti
print
byte    " characters long"
cr
lf
0

15.4.9 Strlwr Strlwrm Strupr Struprm

Strlwr and Strlwrm convert any upper case characters in a string to lower case. Strupr and Struprm convert any lower case characters in a string to upper case. These routines do not affect any other characters present in the string. For all four routines es:di points at the source string to convert. Strlwr and strupr modify the characters directly in that string. Strlwrm and struprm make a copy of the string to the heap and then convert the characters in the new string. They also return a pointer to this new string in es:di. As usual for UCR StdLib routines strlwrm and struprm return the carry flag set if there is a memory allocation error. Examples:

String1         byte    "This string has lower case."
0
String2         byte    "THIS STRING has Upper Case."
0
StrPtr1         dword   0
StrPtr2         dword   0
.
.
.
lesi    String1         ;See strcat for lesi definition.
struprm                 ;Convert lower case to upper case.
jc      error
mov     word ptr StrPtr1
di
mov     word ptr StrPtr1+2
es

lesi    String2
strlwrm                 ;Convert upper case to lower case.
jc      error
mov     word ptr StrPtr2
di
mov     word ptr StrPtr2+2
es

lesi    String1
strlwr                  ;Convert to lower case
in place.

lesi    String2
strupr                  ;Convert to upper case
in place.

printf
byte    "struprm: %^s\n"
byte    "strlwrm: %^s\n"
byte    "strlwr: %s\n"
byte    "strupr: %s\n"
0
dword   StrPtr1
StrPtr2
String1
String2

The above code fragment prints the following:

struprm: THIS STRING HAS LOWER CASE
strlwrm: this string has upper case
strlwr: this string has lower case
strupr: THIS STRING HAS UPPER CASE

15.4.10 Strrev Strrevm

These two routines reverse the characters in a string. For example if you pass strrev the string "ABCDEF" it will convert that string to "FEDCBA". As you'd expect by now the strrev routine reverse the string whose address you pass in es:di; strrevm first makes a copy of the string on the heap and reverses those characters leaving the original string unchanged. Of course strrevm will return the carry flag set if there was a memory allocation error. Example:

Palindrome      byte    "radar"
0
NotPaldrm       byte    "x + y - z"
0
StrPtr1         dword   0
.
.
.
lesi    Palindrome      ;See strcat for lesi definition.
strrevm
jc      error
mov     word ptr StrPtr1
di
mov     word ptr StrPtr1+2
es

lesi    NotPaldrm
strrev

printf
byte    "First string: %^s\n"
byte    "Second string: %s\n"
0
dword   StrPtr1
NotPaldrm

The above code produces the following output:

First string: radar
Second string: z - y + x

15.4.11 Strset Strsetm

Strset and strsetm replicate a single character through a string. Their behavior however is not quite the same. In particular while strsetm is quite similar to the repeat function (see "Repeat" on page 840) strset is not. Both routines expect a single character value in the al register. They will replicate this character throughout some string. Strsetm also requires a count in the cx register. It creates a string on the heap consisting of cx characters and returns a pointer to this string in es:di (assuming no memory allocation error). Strset on the other hand expects you to pass it the address of an existing string in es:di. It will replace each character in that string with the character in al. Note that you do not specify a length when using the strset function strset uses the length of the existing string. Example:

String1         byte    "Hello there"
0
.
.
.
lesi    String1         ;See strcat for lesi definition.
mov     al
'*'
strset

mov     cx
8
mov     al
'#'
strsetm

print
byte    "String2: "
0
puts
printf
byte    "\nString1: %s\n"
0
dword   String1

The above code produces the output:

String2: ########
String1: ***********

15.4.12 Strspan Strspanl Strcspan Strcspanl

These four routines search through a string for a character which is either in some specified character set (strspan strspanl) or not a member of some character set (strcspan strcspanl). These routines appear in the UCR Standard Library only because of their appearance in the C standard library. You should rarely use these routines. The UCR Standard Library includes some other routines for manipulating character sets and performing character matching operations. Nonetheless these routines are somewhat useful on occasion and are worth a mention here.

These routines expect you to pass them the addresses of two strings: a source string and a character set string. They expect the address of the source string in es:di. Strspan and strcspan want the address of the character set string in dx:si; the character set string follows the call with strspanl and strcspanl. On return cx contains an index into the string defined as follows:

strspan strspanl: Index of first character in source found in the character set.

strcspan strcspanl: Index of first character in source not found in the character set.

If all the characters are in the set (or are not in the set) then cx contains the index into the string of the zero terminating byte.

Example:

Source          byte    "ABCDEFG 0123456"
0
Set1            byte    "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
0
Set2            byte    "0123456789"
0
Index1          word    ?
Index2          word    ?
Index3          word    ?
Index4          word    ?
.
.
.
lesi    Source          ;See strcat for lesi definition.
ldxi    Set1            ;See strcat for ldxi definition.
strspan                 ;Search for first ALPHA char.
mov     Index1
cx      ;Index of first alphabetic char.

lesi    Source
lesi    Set2
strspan                 ;Search for first numeric char.
mov     Index2
cx

lesi    Source
strcspanl
byte    "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
0
mov     Index3
cx

lesi    Set2
strcspnl
byte    "0123456789"
0
mov     Index4
cx

printf
byte    "First alpha char in Source is at offset %d\n"
byte    "First numeric char is at offset %d\n"
byte    "First non-alpha in Source is at offset %d\n"
byte    "First non-numeric in Set2 is at offset %d\n"
0
dword   Index1
Index2
Index3
Index4

This code outputs the following:

First alpha char in Source is at offset 0
First numeric char is at offset 8
First non-alpha in Source is at offset 7
First non-numeric in Set2 is at offset 10

15.4.13 Strstr Strstrl

Strstr searches for the first occurrence of one string within another. es:di contains the address of the string in which you want to search for a second string. dx:si contains the address of the second string for the strstr routine; for strstrl the search second string immediately follows the call in the code stream.

On return from strstr or strstrl the carry flag will be set if the second string is not present in the source string. If the carry flag is clear then the second string is present in the source string and cx will contain the (zero-based) index where the second string was found. Example:

SourceStr       byte    "Search for 'this' in this string"
0
SearchStr       byte    "this"
0
.
.
.
lesi    SourceStr       ;See strcat for lesi definition.
ldxi    SearchStr       ;See strcat for ldxi definition.
strstr
jc      NotPresent
print
byte    "Found string at offset "
0
mov     ax
cx          ;Need offset in AX for puti
puti
putcr

lesi    SourceStr
strstrl
byte    "for"
0
jc      NotPresent
print
byte    "Found 'for' at offset "
0
mov     ax
cx
puti
putcr
NotPresent:

The above code prints the following:

Found string at offset 12
Found 'for' at offset 7

15.4.14 Strtrim Strtrimm

These two routines are quite similar to strbdel and strbdelm. Rather than removing leading spaces however they trim off any trailing spaces from a string. Strtrim trims off any trailing spaces directly on the specified string in memory. Strtrimm first copies the source string and then trims and space off the copy. Both routines expect you to pass the address of the source string in es:di. Strtrimm returns a pointer to the new string (if it could allocate it) in es:di. It also returns the carry set or clear to denote error/no error. Example:

String1         byte    "Spaces at the end      "
0
String2         byte    "    Spaces on both sides     "
0
StrPtr1         dword   0
StrPtr2         dword   0
.
.
.

; TrimSpcs trims the spaces off both ends of a string.
; Note that it is a little more efficient to perform the
; strbdel first
then the strtrim. This routine creates
; the new string on the heap and returns a pointer to this
; string in ES:DI.

TrimSpcs        proc
strbdelm
jc      BadAlloc        ;Just return if error.
strtrim
clc
BadAlloc:       ret
TrimSpcs        endp
.
.
.
lesi    String1         ;See strcat for lesi definition.
strtrimm
jc      error
mov     word ptr StrPtr1
di
mov     word ptr StrPtr1+2
es

lesi    String2
call    TrimSpcs
jc      error
mov     word ptr StrPtr2
di
mov     word ptr StrPtr2+2
es

printf
byte    "First string: '%s'\n"
byte    "Second string: '%s'\n"
0
dword   StrPtr1
StrPtr2

This code fragment outputs the following:

First string: 'Spaces at the end'
Second string: 'Spaces on both sides'

15.4.15 Other String Routines in the UCR Standard Library

In addition to the "strxxx" routines listed in this section there are many additional string routines available in the UCR Standard Library. Routines to convert from numeric types (integer hex real etc.) to a string or vice versa pattern matching and character set routines and many other conversion and string utilities. The routines described in this chapter are those whose definitions appear in the "strings.a" header file and are specifically targeted towards generic string manipulation. For more details on the other string routines consult the UCR Standard Library reference section in the appendices.

Chapter Fifteen (Part 4)

Table of Content

Chapter Fifteen (Part 6) 

Chapter Fifteen: Strings And Character Sets (Part 5)
28 SEP 1996