| Table of Content | Chapter Fifteen (Part 2) |
|
| CHAPTER
FIFTEEN: STRINGS AND CHARACTER SETS (Part 1) |
||
| 15.0 -
Chapter Overview 15.1 - The 80x86 String Instructions 15.1.1 - How the String Instructions Operate 15.1.2 - The REP/REPE/REPZ and REPNZ/REPNE Prefixes 15.1.3 - The Direction Flag 15.1.4 - The MOVS Instruction 15.1.5 - The CMPS Instruction 15.1.6 - The SCAS Instruction 15.1.7 - The STOS Instruction 15.1.8 - The LODS Instruction 15.1.9 - Building Complex String Functions from LODS and STOS 15.1.10 - Prefixes and the String Instructions 15.2 - Character Strings 15.2.1 - Types of Strings 15.2.2 - String Assignment 15.2.3 - String Comparison 15.3 - Character String Functions 15.3.1 - Substr 15.3.2 - Index 15.3.3 - Repeat 15.3.4 - Insert 15.3.5 - Delete 15.3.6 - Concatenation 15.4 - String Functions in the UCR Standard Library 15.4.1 - StrBDel StrBDelm 15.4.2 - Strcat Strcatl Strcatm Strcatml 15.4.3 - Strchr 15.4.4 - Strcmp Strcmpl Stricmp Stricmpl 15.4.5 - Strcpy Strcpyl Strdup Strdupl 15.4.6 - Strdel Strdelm 15.4.7 - Strins Strinsl Strinsm Strinsml 15.4.8 - Strlen 15.4.9 - Strlwr Strlwrm Strupr Struprm 15.4.10 - Strrev Strrevm 15.4.11 - Strset Strsetm 15.4.12 - Strspan Strspanl Strcspan Strcspanl 15.4.13 - Strstr Strstrl 15.4.14 - Strtrim Strtrimm 15.4.15 - Other String Routines in the UCR Standard Library 15.5 - The Character Set Routines in the UCR Standard Library 15.6 - Using the String Instructions on Other Data Types 15.6.1 - Multi-precision Integer Strings 15.6.2 - Dealing with Whole Arrays and Records 15.7 - Sample Programs 15.7.1 - Find.asm 15.7.2 - StrDemo.asm 15.7.3 - Fcmp.asm |
Copyright 1996 by Randall Hyde
All rights reserved. Duplication other than for immediate display through a browser is prohibited by U.S. Copyright Law. This material is provided on-line as a beta-test of this text. It is for the personal use of the reader only. If you are interested in using this material as part of a course please contact rhyde@cs.ucr.edu Supporting software and other materials are available via anonymous ftp from ftp.cs.ucr.edu. See the "/pub/pc/ibmpcdir" directory for details. You may also download the material from "Randall Hyde's Assembly Language Page" at URL: http://webster.ucr.edu Notes: This document does not contain the laboratory exercises programming assignments exercises or chapter summary. These portions were omitted for several reasons: either they wouldn't format properly they contained hyperlinks that were too much work to resolve they were under constant revision or they were not included for security reasons. Such omission should have very little impact on the reader interested in learning this material or evaluating this document. This document was prepared using Harlequin's Web Maker 2.2 and Quadralay's Webworks Publisher. Since HTML does not support the rich formatting options available in Framemaker this document is only an approximation of the actual chapter from the textbook. If you are absolutely dying to get your hands on a version other than HTML you might consider having the UCR Printing a Reprographics Department run you off a copy on their Xerox machines. For details please read the following EMAIL message I received from the Printing and Reprographics Department:
We are currently working on ways to publish this text in a form other than HTML (e.g. Postscript PDF Frameviewer hard copy etc.). This however is a low-priority project. Please do not contact Randall Hyde concerning this effort. When something happens an announcement will appear on "Randall Hyde's Assembly Language Page." Please visit this WEB site at http://webster.ucr.edu for the latest scoop. Redesigned 10/2000 with "MS FrontPage 98" using
17" monitor 1024x768 |
|
A string is a collection of objects stored in contiguous memory locations. Strings are usually arrays of bytes words or (on 80386 and later processors) double words. The 80x86 microprocessor family supports several instructions specifically designed to cope with strings. This chapter explores some of the uses of these string instructions.
The 8088 8086 80186 and 80286 can process two types of strings: byte strings and word strings. The 80386 and later processors also handle double word strings. They can move strings compare strings search for a specific value within a string initialize a string to a fixed value and do other primitive operations on strings. The 80x86's string instructions are also useful for manipulating arrays tables and records. You can easily assign or compare such data structures using the string instructions. Using string instructions may speed up your array manipulation code considerably.
This chapter presents a review of the operation of the 80x86 string instructions. Then it discusses how to process character strings using these instructions. Finally it concludes by discussing the string instruction available in the UCR Standard Library. The sections below that have a "*" prefix are essential. Those sections with a "o" discuss advanced topics that you may want to put off for a while.
* The 80x86 string instructions.
* Character strings.
* Character string functions.
* String functions in the UCR Standard Library.
o Using the string instructions on other data types.
All members of the 80x86 family support five different
string instructions: movs
cmps
scas
lods
and stos[1]. They are the string primitives since you can build most
other string operations from these five instructions. How you use these five instructions
is the topic of the next several sections.
15.1.1 How the String Instructions Operate
The string instructions operate on blocks (contiguous
linear arrays) of memory. For example
the movs instruction moves a sequence
of bytes from one memory location to another. The cmps instruction compares
two blocks of memory. The scas instruction scans a block of memory for a
particular value. These string instructions often require three operands
a destination
block address
a source block address
and (optionally) an element count. For example
when using the movs instruction to copy a string
you need a source address
a destination address
and a count (the number of string elements to move).
Unlike other instructions which operate on memory the string instructions are single-byte instructions which don't have any explicit operands. The operands for the string instructions include
si (source index) register
di (destination index) register
cx (count) register
ax register
and For example
one variant of the movs (move
string) instruction copies a string from the source address specified by ds:si to
the destination address specified by es:di
of length cx.
Likewise
the cmps instruction compares the string pointed at by ds:si
of length cx
to the string pointed at by es:di.
Not all instructions have source and destination operands
(only movs and cmps support them). For example
the scas instruction
(scan a string) compares the value in the accumulator to values in memory. Despite their
differences
the 80x86's string instructions all have one thing in common - using them
requires that you deal with two segments
the data segment and the extra segment.
15.1.2 The REP/REPE/REPZ and REPNZ/REPNE Prefixes
The string instructions
by themselves
do not operate on
strings of data. The movs instruction
for example
will move a single byte
word
or double word. When executed by itself
the movs instruction ignores
the value in the cx register. The repeat prefixes tell the 80x86 to do a
multi-byte string operation. The syntax for the repeat prefix is:
Field:
Label repeat mnemonic operand ;comment
For MOVS:
rep movs {operands}
For CMPS:
repe cmps {operands}
repz cmps {operands}
repne cmps {operands}
repnz cmps {operands}
For SCAS:
repe scas {operands}
repz scas {operands}
repne scas {operands}
repnz scas {operands}
For STOS:
rep stos {operands}
You don't normally use the repeat prefixes with the lods
instruction.
As you can see the presence of the repeat prefixes introduces a new field in the source line - the repeat prefix field. This field appears only on source lines containing string instructions. In your source file:
When specifying the repeat prefix before a string
instruction
the string instruction repeats cx times[2].
Without the repeat prefix
the instruction operates only on a single byte
word
or double
word.
You can use repeat prefixes to process entire strings with a single instruction. You can use the string instructions without the repeat prefix as string primitive operations to synthesize more powerful string operations.
The operand field is optional. If present
MASM simply uses
it to determine the size of the string to operate on. If the operand field is the name of
a byte variable
the string instruction operates on bytes. If the operand is a word
address
the instruction operates on words. Likewise for double words. If the operand
field is not present
you must append a "B"
"W"
or "D" to
the end of the string instruction to denote the size
e.g.
movsb
movsw
or movsd.
Besides the si
di
si
and ax registers
one other register controls the 80x86's string instructions - the flags register.
Specifically
the direction flag in the flags register controls how the CPU processes
strings.
If the direction flag is clear
the CPU increments si
and di after operating upon each string element. For example
if the
direction flag is clear
then executing movs will move the byte
word
or
double word at ds:si to es:di and will increment si and
di by one
two
or four. When specifying the rep prefix before
this instruction
the CPU increments si and di for each element
in the string. At completion
the si and di registers will be
pointing at the first item beyond the string.
If the direction flag is set
then the 80x86 decrements si
and di after processing each string element. After a repeated string
operation
the si and di registers will be pointing at the first
byte or word before the strings if the direction flag was set.
The direction flag may be set or cleared using the cld
(clear direction flag) and std (set direction flag) instructions. When
using these instructions inside a procedure
keep in mind that they modify the machine
state. Therefore
you may need to save the direction flag during the execution of that
procedure. The following example exhibits the kinds of problems you might encounter:
StringStuff: cld <do some operations> call Str2 <do some string operations requiring D=0> . . . Str2 proc near std <Do some string operations> ret Str2 endp
This code will not work properly. The calling code assumes
that the direction flag is clear after Str2 returns. However
this isn't
true. Therefore
the string operations executed after the call to Str2 will
not function properly.
There are a couple of ways to handle this problem. The
first
and probably the most obvious
is always to insert the cld or std
instructions immediately before executing a string instruction. The other
alternative is to save and restore the direction flag using the pushf and popf
instructions. Using these two techniques
the code above would look like this:
Always issuing cld or std before
a string instruction:
StringStuff: cld <do some operations> call Str2 cld <do some string operations requiring D=0> . . . Str2 proc near std <Do some string operations> ret Str2 endp
Saving and restoring the flags register:
StringStuff: cld <do some operations> call Str2 <do some string operations requiring D=0> . . . Str2 proc near pushf std <Do some string operations> popf ret Str2 endp
If you use the pushf and popf instructions
to save and restore the flags register
keep in mind that you're saving and restoring all
the flags. Therefore
such subroutines cannot return any information in the flags. For
example
you will not be able to return an error condition in the carry flag if you use pushf
and popf.
[1] The 80186 and later processor support two additional string instructions INS and OUTS which input strings of data from an input port or output strings of data to an output port. We will not consider these instructions in this chapter.
[2] Except for
the cmps instruction which repeats at most the number of times specified in
the cx register.
Chapter Fifteen: Strings And
Character Sets (Part 1)
28 SEP 1996