VARIABLES AND DATA STRUCTURES (Part 1)
5.1 - Some Additional Instructions: LEA LES ADD and MUL
5.2 - Declaring Variables in an Assembly Language Program
5.3 - Declaring and Accessing Scalar Variables
5.3.1 - Declaring and using BYTE Variables
5.3.2 - Declaring and using WORD Variables
5.3.3 - Declaring and using DWORD Variables
5.3.4 - Declaring and using FWORD QWORD and TBYTE Variables
5.3.5 - Declaring Floating Point Variables with REAL4 REAL8 and REAL10
5.4 - Creating Your Own Type Names with TYPEDEF
5.5 - Pointer Data Types
5.6 - Composite Data Types
5.6.1 - Arrays
126.96.36.199 - Declaring Arrays in Your Data Segment
188.8.131.52 - Accessing Elements of a Single Dimension Array
5.6.2 - Multidimensional Arrays
184.108.40.206 - Row Major Ordering
220.127.116.11 - Column Major Ordering
18.104.22.168 - Allocating Storage for Multidimensional Arrays
22.214.171.124 - Accessing Multidimensional Array Elements in Assembly Language
5.6.3 - Structures
5.6.4 - Arrays of Structures and Arrays/Structures as Structure Fields
5.6.5 - Pointers to Structures
5.7 - Sample Programs
5.7.1 - Simple Variable Declarations
5.7.2 - Using Pointer Variables
5.7.3 - Single Dimension Array Access
5.7.4 - Multidimensional Array Access
5.7.5 - Simple Structure Access
5.7.6 - Arrays of Structures
5.7.7 - Structures and Arrays as Fields of Another Structure
5.7.8 - Pointers to Structures and Arrays of Structures
|Copyright 1996 by Randall Hyde
All rights reserved.
Duplication other than for immediate display through a browser is prohibited by U.S. Copyright Law.
This material is provided on-line as a beta-test of this text. It is for the personal use of the reader only. If you are interested in using this material as part of a course please contact firstname.lastname@example.org
Supporting software and other materials are available via anonymous ftp from ftp.cs.ucr.edu. See the "/pub/pc/ibmpcdir" directory for details. You may also download the material from "Randall Hyde's Assembly Language Page" at URL: http://webster.ucr.edu
This document does not contain the laboratory exercises programming assignments exercises or chapter summary. These portions were omitted for several reasons: either they wouldn't format properly they contained hyperlinks that were too much work to resolve they were under constant revision or they were not included for security reasons. Such omission should have very little impact on the reader interested in learning this material or evaluating this document.
This document was prepared using Harlequin's Web Maker 2.2 and Quadralay's Webworks Publisher. Since HTML does not support the rich formatting options available in Framemaker this document is only an approximation of the actual chapter from the textbook.
If you are absolutely dying to get your hands on a version other than HTML you might consider having the UCR Printing a Reprographics Department run you off a copy on their Xerox machines. For details please read the following EMAIL message I received from the Printing and Reprographics Department:
We are currently working on ways to publish this text in a form other than HTML (e.g. Postscript PDF Frameviewer hard copy etc.). This however is a low-priority project. Please do not contact Randall Hyde concerning this effort. When something happens an announcement will appear on "Randall Hyde's Assembly Language Page." Please visit this WEB site at http://webster.ucr.edu for the latest scoop.
Redesigned 10/2000 with "MS FrontPage 98" using
17" monitor 1024x768
Chapter One discussed the basic format for data in memory. Chapter Three covered how a computer system physically organizes that data. This chapter finishes this discussion by connecting the concept of data representation to its actual physical representation. As the title implies this chapter concerns itself with two main topics: variables and data structures. This chapter does not assume that you've had a formal course in data structures though such experience would be useful.
This chapter discusses how to declare and access scalar variables integers reals data types pointers arrays and structures. You must master these subjects before going on to the next chapter. Declaring and accessing arrays in particular seems to present a multitude of problems to beginning assembly language programmers. However the rest of this text depends on your understanding of these data structures and their memory representation. Do not try to skim over this material with the expectation that you will pick it up as you need it later. You will need it right away and trying to learn this material along with later material will only confuse you more.
The purpose of this chapter is not to present the 80x86
instruction set. However
there are four additional instructions (above and beyond
that will prove handy in the discussion throughout the rest of this chapter. These are the
load effective address (
es and general purpose
and multiply (
along with the
provide all the necessary
power to access the different data types this chapter discusses.
lea instruction takes the form:
lea reg16 memory
reg16 is a 16 bit general purpose register. Memory is a memory location represented by a mod/reg/rm byte (except it must be a memory location it cannot be a register).
This instruction loads the 16 bit register with the offset
of the location specified by the memory operand.
ax with the address of the memory location pointed at by
is the value
1000h+bx+si. Lea is also quite useful for
obtaining the address of a variable. If you have a variable I somewhere in memory
I will load the
bx register with the address (offset) of I.
les instruction takes the form
les reg16 memory32
This instruction loads the
es register and one
of the 16 bit general purpose registers from the specified memory address. Note that any
memory address you can specify with a mod/reg/rm byte is legal but like the
instruction it must be a memory location
not a register.
les instruction loads the specified
general purpose register from the word at the given address
it loads the
register from the following word in memory. This instruction
and it's companion
ds) are the only instructions on pre-80386 machines that
manipulate 32 bits at a time.
like it's x86
adds two values on the 80x86. This instruction takes several forms. There are
five forms that concern us here. They are
add reg reg add reg memory add memory reg add reg constant add memory constant
All these instructions add the second operand to the first
leaving the sum in the first operand. For example
:= bx + 5.
The last instruction to look at is the
(multiply) instruction. This instruction has only a single operand and takes the form:
There are many important details concerning
that this chapter ignores. For the sake of the discussion that follows
assume that the
register or memory location is a 16 bit register or memory location. In such a case this
dx:ax :=ax*reg/mem. Note that there is no immediate mode
for this instruction.
Although you've probably surmised that memory locations and variables are somewhat related this chapter hasn't gone out of its way to draw strong parallels between the two. Well it's time to rectify that situation. Consider the following short (and useless) Pascal program:
program useless(input output); var i j:integer; begin
i := 10; write('Enter a value for j:'); readln(j); i := i*j + j*j; writeln('The result is ' i);
When the computer executes the statement
it makes a copy of the value 10 and somehow remembers this value for use later on. To
the compiler sets aside a memory location specifically for the exclusive
use of the variable
i. Assuming the compiler arbitrarily assigned location
DS:10h for this purpose it could use the instruction
accomplish this. If
i is a 16 bit word
the compiler would probably assign
j to the word starting at location 12h or 0Eh. Assuming it's
the second assignment statement in the program might wind up looking like
mov ax ds:[10h] ;Fetch value of I mul ds:[12h] ;Multiply by J mov ds:[10h] ax ;Save in I (ignore overflow) mov ax ds:[12h] ;Fetch J mul ds:[12h] ;Compute J*J add ds:[10h] ax ;Add I*J + J*J store into I
Although there are a few details missing from this code it is fairly straightforward and you can easily see what is going on in this program.
Now imagine a 5
000 line program like this one using
variables like ds:[10h]
etc. Would you want to locate the statement
where you accidentally stored the result of a computation into
j rather than
why should you even care that the variable
i is at location
j is at location 12h? Why shouldn't you be able to use names like
j rather than worrying about these numerical addresses? It seems
reasonable to rewrite the code above as:
mov ax i mul j mov i ax mov ax j mul j add i ax
Of course you can do this in assembly language! Indeed
of the primary jobs of an assembler like MASM is to let you use symbolic names for memory
the assembler will even assign locations to the names
automatically for you. You needn't concern yourself with the fact that variable
really the word at memory location DS:10h unless you're curious.
It should come as no surprise that
point to the dseg segment in the SHELL.ASM file. Indeed
that it points at dseg is one of the first things that happens in the SHELL.ASM main
all you've got to do is tell the assembler to reserve some storage for
your variables in dseg and attach the offset of said variables with the names of those
variables. This is a very simple process and is the subject of the next several sections.
Scalar variables hold single values. The variables
j in the preceding section are examples of scalar variables. Examples of data
structures that are not scalars include arrays
and lists. These latter
data types are made up from scalar values. They are the composite types. You'll see the
composite types a little later; first you need to learn to deal with the scalar types.
To declare a variable in dseg you would use a statement something like the following:
ByteVar byte ?
is a label. It should begin at column one on the line somewhere in the dseg segment (that is between the
dseg endsstatements). You'll find out all about labels in a few chapters for now you can assume that most legal Pascal/C/Ada identifiers are also valid assembly language labels.
If you need more than one variable in your program
place additional lines in the dseg segment declaring those variables. MASM will
automatically allocate a unique storage location for the variable (it wouldn't be too good
j located at the same address now
After declaring said variable
MASM will allow you to refer to that variable by name
rather than by location in your program. For example
after inserting the above statement
into the data segment (dseg)
you could use instructions like
in your program.
The first variable you place in the data segment gets allocated storage at location DS:0. The next variable in memory gets allocated storage just beyond the previous variable. For example if the variable at location zero was a byte variable the next variable gets allocated storage at DS:1. However if the first variable was a word the second variable gets allocated storage at location DS:2. MASM is always careful to allocate variables in such a manner that they do not overlap. Consider the following dseg definition:
dseg segment para public 'data' bytevar byte ? ;byte allocates bytes wordvar word ? ;word allocates words dwordvar dword ? ;dword allocs dbl words byte2 byte ? word2 word ? dseg ends
MASM allocates storage for bytevar at location DS:0. Because bytevar is one byte long the next available memory location is going to be DS:1. MASM therefore allocates storage for wordvar at location DS:1. Since words require two bytes the next available memory location after wordvar is DS:3 which is where MASM allocates storage for dwordvar. Dwordvar is four bytes long so MASM allocates storage for byte2 starting at location DS:7. Likewise MASM allocates storage for word2 at location DS:8. Were you to stick another variable after word2 MASM would allocate storage for it at location DS:0A.
Whenever you refer to one of the names above
automatically substitutes the appropriate offset. For example
MASM would translate the
wordvar instruction into
ds:. So now you can use symbolic
names for your variables and completely ignore the fact that these variables are really
memory locations with corresponding offsets into the data segment.
5.3.1 Declaring and using BYTE Variables
So what are byte variables good for anyway? Well you can certainly represent any data type that has less than 256 different values with a single byte. This includes some very important and often-used data types including the character data type boolean data type most enumerated data types and small integer data types (signed and unsigned) just to name a few.
Characters on a typical IBM compatible system use the eight bit ASCII/IBM character set. The 80x86 provides a rich set of instructions for manipulating character data. It's not surprising to find that most byte variables in a typical program hold character data.
The boolean data type represents only two values: true or false. Therefore it only takes a single bit to represent a boolean value. However the 80x86 really wants to work with data at least eight bits wide. It actually takes extra code to manipulate a single bit rather than a whole byte. Therefore you should use a whole byte to represent a boolean value. Most programmers use the value zero to represent false and anything else (typically one) to represent true. The 80x86's zero flag makes testing for zero/not zero very easy. Note that this choice of zero or non-zero is mainly for convenience. You could use any two different values (or two different sets of values) to represent true and false.
Most high level languages that support enumerated data types convert them (internally) to unsigned integers. The first item in the list is generally item zero the second item in the list is item one the third is item two etc. For example consider the following Pascal enumerated data type:
colors = (red blue green purple orange yellow white black);
Most Pascal compilers will assign the value zero to red one to blue two to green etc.
you will see how to actually create your own
enumerated data types in assembly language. All you need to learn now is how to allocate
storage for a variable that holds an enumerated value. Since it's unlikely there will be
more than 256 items enumerated by the data type
you can use a simple byte variable to
hold the value. If you have a variable
say color of type colors
using the instruction
2 is the same thing as saying
color:=green in Pascal.
you'll even learn how to use more meaningful statements like
to assign the color green to the color variable).
Of course if you have a small unsigned integer value (0...255) or small signed integer (-128...127) a single byte variable is the best way to go in most cases. Note that most programmers treat all data types except small signed integers as unsigned values. That is characters booleans enumerated types and unsigned integers are all usually unsigned values. In some very special cases you might want to treat a character as a signed value but most of the time even characters are unsigned values.
There are three main statements for declaring byte variables in a program. They are
identifier db ? identifier byte ? and identifier sbyte ?
identifier represents the name of your byte variable.
db" is an older term that predates MASM 6.x. You will see this
directive used quite a bit by other programmers (especially those who are not using MASM
6.x or later) but Microsoft considers it to be an obsolete term; you should always use the
sbyte declarations instead.
byte declaration declares unsigned byte
variables. You should use this declaration for all byte variables except small signed
integers. For signed integer values
sbyte (signed byte) directive.
Once you declare some byte variables with these statements you may reference those variables within your program by their names:
i db ? j byte ? k sbyte ? . . . mov i 0 mov j 245 mov k -5 mov al i mov j al etc.
Although MASM 6.x performs a small amount of type checking you should not get the idea that assembly language is a strongly typed language. In fact MASM 6.x will only check the values you're moving around to verify that they will fit in the target location. All of the following are legal in MASM 6.x:
mov k 255 mov j -5 mov i -127
Since all of these variables are byte-sized variables and all the associated constants will fit into eight bits MASM happily allows each of these statements. Yet if you look at them they are logically incorrect. What does it mean to move -5 into an unsigned byte variable? Since signed byte values must be in the range -128...127 what happens when you store the value 255 into a signed byte variable? Well MASM simply converts these values to their eight bit equivalents (-5 becomes 0FBh 255 becomes 0FFh [-1] etc.).
Perhaps a later version of MASM will perform stronger type checking on the values you shove into these variables perhaps not. However you should always keep in mind that it will always be possible to circumvent this checking. It's up to you to write your programs correctly. The assembler won't help you as much as Pascal or Ada will. Of course even if the assembler disallowed these statements it would still be easy to get around the type checking. Consider the following sequence:
mov al -5 . ; Any number of statements which do not affect AL . mov j al
no way the assembler is going to
be able to tell you that you're storing an illegal value into
by their very nature
are neither signed nor unsigned. Therefore the assembler
will let you store a register into a variable regardless of the value that may be in that
Although the assembler does not check to see if both operands to an instruction are signed or unsigned it most certainly checks their size. If the sizes do not agree the assembler will complain with an appropriate error message. The following examples are all illegal:
mov i ax ;Cannot move 16 bits into eight mov i 300 ;300 won't fit in eight bits. mov k -130 ;-130 won't fit into eight bits.
You might ask "if the assembler doesn't really
differentiate signed and unsigned values
why bother with them? Why not simply use
db all the time?" Well
there are two reasons. First
it makes your programs
easier to read and understand if you explicitly state (by using byte and sbyte) which
variables are signed and which are unsigned. Second
who said anything about the assembler
ignoring whether the variables are signed or unsigned? The
ignores the difference
but there are other instructions that do not.
One final point is worth mentioning concerning the declaration of byte variables. In all of the declarations you've seen thus far the operand field of the instruction has always contained a question mark. This question mark tells the assembler that the variable should be left uninitialized when DOS loads the program into memory. You may specify an initial value for the variable that will be loaded into memory before the program starts executing by replacing the question mark with your initial value. Consider the following byte variable declarations:
i db 0 j byte 255 k sbyte -1
In this example
the assembler will initialize
k to zero
when the program loads into memory.
This fact will prove quite useful later on
especially when discussing tables and arrays.
the assembler only checks the sizes of the operands. It does not check to make
sure that the operand for the
byte directive is positive or that the value in
the operand field of
sbyte is in the range -128...127. MASM will allow any
value in the range -128...255 in the operand field of any of these statements.
In case you get the impression that there isn't a real
reason to use byte vs. sbyte in a program
you should note that while MASM sometimes
ignores the differences in these definitions
Microsoft's CodeView debugger does not. If
you've declared a variable as a signed value
CodeView will display it as such (including
a minus sign
if necessary). On the other hand
CodeView will always display
byte variables as positive values.
5.3.2 Declaring and using WORD Variables
Most 80x86 programs use word values for three things: 16 bit signed integers 16 bit unsigned integers and offsets (pointers). Oh sure you can use word values for lots of other things as well but these three represent most applications of the word data type. Since the word is the largest data type the 8086 8088 80186 80188 and 80286 can handle you'll find that for most programs the word is the basis for most computations. Of course the 80386 and later allow 32 bit computations but many programs do not use these 32 bit instructions since that would limit them to running on 80386 or later CPUs.
You use the
statements to declare word variables. The following examples demonstrate their use:
NoSignedWord dw ? UnsignedWord word ? SignedWord sword ? Initialized0 word 0 InitializedM1 sword -1 InitializedBig word 65535 InitializedOfs dw NoSignedWord
Most of these declarations are slight modifications of the byte declarations you saw in the last section. Of course you may initialize any word variable to a value in the range -32768...65535 (the union of the range for signed and unsigned 16 bit constants). The last declaration above however is new. In this case a label appears in the operand field (specifically the name of the NoSignedWord variable). When a label appears in the operand field the assembler will substitute the offset of that label (within the variable's segment). If these were the only declarations in dseg and they appeared in this order the last declaration above would initialize InitializedOfs with the value zero since NoSignedWord's offset is zero within the data segment. This form of initialization is quite useful for initializing pointers. But more on that subject later.
The CodeView debugger differentiates
sword variables. It always displays the unsigned values as
positive integers. On the other hand
it will display
sword variables as
signed values (complete with minus sign
if the value is negative). Debugging support is
one of the main reasons you'll want to use
5.3.3 Declaring and using DWORD Variables
You may use the
instructions to declare four-byte integers
and other variables types. Such
variables will allow values in the range -2
295 (the union of the
range of signed and unsigned four-byte integers). You use these declarations like the
NoSignedDWord dd ? UnsignedDWord dword ? SignedDWord sdword ? InitBig dword 4000000000 InitNegative sdword -1 InitPtr dd InitBig
The last example initializes a double word pointer with the segment:offset address of the InitBig variable.
it's worth pointing out that the assembler
doesn't check the types of these variables when looking at the initialization values. If
the value fits into 32 bits
the assembler will accept it. Size checking
strictly enforced. Since the only 32 bit
mov instructions on processors
earlier than the 80386 are
you will get an error if
you attempt to access dword variables on these earlier processors using a
instruction. Of course
even on the 80386 you cannot move a 32 bit variable into a 16 bit
you must use the 32 bit registers. Later
you'll learn how to manipulate 32 bit
even on a 16 bit processor. Until then
just pretend that you can't.
Keep in mind
that CodeView differentiates
sdword. This will help you see the actual
values your variables have when you're debugging your programs. CodeView only does this
if you use the proper declarations for your variables. Always use
for signed values and
dword is best)
for unsigned values.
5.3.4 Declaring and using FWORD QWORD and TBYTE Variables
MASM 6.x also lets you declare six-byte
ten-byte variables using the
statements. Declarations using these statements were originally intended for floating
point and BCD values. There are better directives for the floating point variables and you
don't need to concern yourself with the other data types you'd use these directives for.
The following discussion is for completeness' sake.
df/fword statement's main utility is
declaring 48 bit pointers for use in 32 bit protected mode on the 80386 and later.
Although you could use this directive to create an arbitrary six byte variable
better directives for doing that. You should only use this directive for 48 bit far
pointers on the 80386.
dq/qword lets you declare quadword (eight
byte) variables. The original purpose of this directive was to let you create 64 bit
double precision floating point variables and 64 bit integer variables. There are better
directives for creating floating point variables. As for 64 bit integers
you won't need
them very often on the 80x86 CPU (at least
not until Intel releases a member of the 80x86
family with 64 bit general purpose registers).
dt/tbyte directives allocate ten bytes of
storage. There are two data types indigenous to the 80x87 (math coprocessor) family that
use a ten byte data type: ten byte BCD values and extended precision (80 bit) floating
point values. This text will pretty much ignore the BCD data type. As for the floating
once again there is a better way to do it.
5.3.5 Declaring Floating Point Variables with REAL4 REAL8 and REAL10
These are the directives you should use when declaring
floating point variables. Like
dt these statements
and ten bytes. The operand fields for these statements may contain a
question mark (if you don't want to initialize the variable) or it may contain an initial
value in floating point form. The following examples demonstrate their use:
x real4 1.5 y real8 1.0e-25 z real10 -1.2594e+10
Note that the operand field must contain a valid floating point constant using either decimal or scientific notation. In particular pure integer constants are not allowed. The assembler will complain if you use an operand like the following:
x real4 1
To correct this change the operand field to "1.0".
Please note that it takes special hardware to perform
floating point operations (e.g.
an 80x87 chip or an 80x86 with built-in math
coprocessor). If such hardware is not available
you must write software to perform
operations like floating point addition
etc. In particular
you cannot use the 80x86
add instruction to add two floating point values.
This text will cover floating point arithmetic in a later chapter. Nonetheless
appropriate to discuss how to declare floating point variables in the chapter on data
MASM also lets you use
to declare floating point variables (since these directives reserve the necessary four
or ten bytes of space). You can even initialize such variables with floating point
constants in the operand field. But there are two major drawbacks to declaring variables
this way. First
as with bytes
and double words
the CodeView debugger will only
display your floating point variables properly if you use the
real10 directives. If you use
CodeView will display your values as four
or ten byte unsigned integers. Another
problem with using
dt is that
they allow both integer and floating point constant initializers (remember
real10 do not). Now this might seem like a good feature at
first glance. However
the integer representation for the value one is not the same as the
floating point representation for the value 1.0. So if you accidentally enter the value
"1" in the operand field when you really meant "1.0"
would happily digest this and then give you incorrect results. Hence
you should always
real10 statements to declare floating
Let's say that you simply do not like the names that Microsoft decided to use for declaring byte word dword real and other variables. Let's say that you prefer Pascal's naming convention or perhaps C's naming convention. You want to use terms like integer float double char boolean or whatever. If this were Pascal you could redefine the names in the type section of the program. With C you could use a "#define" or a typedef statement to accomplish the task. Well MASM 6.x has it's own typedef statement that also lets you create aliases of these names. The following example demonstrates how to set up some Pascal compatible names in your assembly language programs:
integer typedef sword char typedef byte boolean typedef byte float typedef real4 colors typedef byte
Now you can declare your variables with more meaningful statements like:
i integer ? ch char ? FoundIt boolean ? x float ? HouseColor colors ?
If you are an Ada C or FORTRAN programmer (or any other language for that matter) you can pick type names you're more comfortable with. Of course this doesn't change how the 80x86 or MASM reacts to these variables one iota but it does let you create programs that are easier to read and understand since the type names are more indicative of the actual underlying types.
Note that CodeView still respects the underlying data type.
If you define integer to be an
CodeView will display variables of
type integer as signed values. Likewise
if you define float to mean
CodeView will still properly display float variables as four-byte floating point values.
Some people refer to pointers as scalar data types others refer to them as composite data types. This text will treat them as scalar data types even though they exhibit some tendencies of both scalar and composite data types (for a complete description of composite data types see "Composite Data Types").
Of course the place to start is with the question "What is a pointer?" Now you've probably experienced pointers first hand in the Pascal C or Ada programming languages and you're probably getting worried right now. Almost everyone has a real bad experience when they first encounter pointers in a high level language. Well fear not! Pointers are actually easier to deal with in assembly language. Besides most of the problems you had with pointers probably had nothing to do with pointers but rather with the linked list and tree data structures you were trying to implement with them. Pointers on the other hand have lots of uses in assembly language that have nothing to do with linked lists trees and other scary data structures. Indeed simple data structures like arrays and records often involve the use of pointers. So if you've got some deep-rooted fear about pointers well forget everything you know about them. You're going to learn how great pointers really are.
Probably the best place to start is with the definition of a pointer. Just exactly what is a pointer anyway? Unfortunately high level languages like Pascal tend to hide the simplicity of pointers behind a wall of abstraction. This added complexity (which exists for good reason by the way) tends to frighten programmers because they don't understand what's going on.
Now if you're afraid of pointers well let's just ignore them for the time being and work with an array. Consider the following array declaration in Pascal:
M: array [0..1023] of integer;
Even if you don't know Pascal
the concept here is pretty
easy to understand. M is an array with 1024 integers in it
M. Each one of these array elements can hold an integer value that is
independent of all the others. In other words
this array gives you 1024 different integer
variables each of which you refer to by number (the array index) rather than by name.
If you encountered a program that had the statement
M:=100 you probably wouldn't have to think at all about what is happening with
this statement. It is storing the value 100 into the first element of the array M. Now
consider the following two statements:
i := 0; (* Assume "i" is an integer variable *) M [i] := 100;
You should agree
without too much hesitation
two statements perform the same exact operation as
probably willing to agree that you can use any integer expression in the range 0...1023 as
an index into this array. The following statements still perform the same operation as our
single assignment to index zero:
i := 5; (* assume all variables are integers*) j := 10; k := 50; m [i*j-k] := 100;
"Okay so what's the point?" you're probably thinking. "Anything that produces an integer in the range 0...1023 is legal. So what?" Okay how about the following:
M  := 0; M [ M  ] := 100;
Whoa! Now that takes a few moments to digest. However
you take it slowly
it makes sense and you'll discover that these two instructions perform
the exact same operation you've been doing all along. The first statement stores zero into
M. The second statement fetches the value of
which is an integer so you can use it as an array index into M
and uses that value (zero)
to control where it stores the value 100.
If you're willing to accept the above as reasonable
but usable nonetheless
then you'll have no problems with pointers.
m is a pointer! Well
but if you were to change
"M" to "memory" and treat this array as all of memory
this is the
exact definition of a pointer.
A pointer is simply a memory location whose value is the address (or index if you prefer) of some other memory location. Pointers are very easy to declare and use in an assembly language program. You don't even have to worry about array indices or anything like that. In fact the only complication you're going to run into is that the 80x86 supports two kinds of pointers: near pointers and far pointers.
A near pointer is a 16 bit value that provides an offset
into a segment. It could be any segment but you will generally use the data segment (dseg
in SHELL.ASM). If you have a word variable
p that contains 1000h
"points" at memory location 1000h in dseg. To access the word that
you could use code like the following:
mov bx p ;Load BX with pointer. mov ax [bx] ;Fetch data that p points at.
By loading the value of
this code loads the value 1000h into
points at memory location 1000h in dseg). The second instruction
above loads the
ax register with the word starting at the location whose
offset appears in
bx now contains 1000h
ax from locations DS:1000 and DS:1001.
Why not just load
ax directly from location
1000h using an instruction like
there are lots of
reasons. But the primary reason is that this single instruction always loads
from location 1000h. Unless you are willing to mess around with self-modifying code
cannot change the location from which it loads
ax. The previous two
ax from the location that
points at. This is very easy to change under program control
without using self-modifying
code. In fact
the simple instruction
2000h will cause those two
instructions above to load
ax from memory location DS:2000 the next time they
execute. Consider the following instructions:
lea bx i ;This can actually be done with mov p bx ; a single instruction as you'll . ; see in Chapter Eight. . < Some code that skips over the next two instructions >
lea bx j ;Assume the above code skips these mov p bx ; two instructions that you get . ; here by jumping to this point from . ; somewhere else. mov bx p ;Assume both code paths above wind mov ax [bx] ; up down here.
This short example demonstrates two execution paths through
the program. The first path loads the variable
p with the address of the
bx with the offset
of the second operand). The second path through the code loads
p with the
address of the variable
j. Both execution paths converge on the last two
instructions that load
upon which execution path was taken. In many respects
this is like a parameter to a
procedure in a high level language like Pascal. Executing the same instructions accesses
different variables depending on whose address (
j) winds up
Sixteen bit near pointers are small fast and the 80x86 provides efficient access using them. Unfortunately they have one very serious drawback - you can only access 64K of data (one segment) when using near pointers. Far pointers overcome this limitation at the expense of being 32 bits long. However far pointers let you access any piece of data anywhere in the memory space. For this reason and the fact that the UCR Standard Library uses far pointers exclusively this text will use far pointers most of the time. But keep in mind that this is a decision based on trying to keep things simple. Code that uses near pointers rather than far pointers will be shorter and faster.
To access data referenced by a 32 bit pointer
need to load the offset portion (L.O. word) of the pointer into
di and the segment portion into a segment register (typically
Then you could access the object using the register indirect addressing mode. Since the
instruction is so convenient for this operation
it is the perfect choice for loading
and one of the above four registers with a pointer value. The following sample code stores
the value in
al into the byte pointed at by the far pointer
les bx p ;Load p into ES:BX mov es:[bx] al ;Store away AL
Since near pointers are 16 bits long and far pointers are
32 bits long
you could simply use the
directives to allocate storage for your pointers (pointers are inherently unsigned
wouldn't normally use
sdword to declare a pointer).
there is a much better way to do this by using the
statement. Consider the following general forms:
typename typedef near ptr basetype typename typedef far ptr basetype
In these two examples typename represents the name of the new type you're creating while basetype is the name of the type you want to create a pointer for. Let's look at some specific examples:
nbytptr typedef near ptr byte fbytptr typedef far ptr byte colorsptr typedef far ptr colors wptr typedef near ptr word intptr typedef near ptr integer intHandle typedef near ptr intptr
(these declarations assume that you've previously defined
the types colors and integer with the
typedef statement). The
statements with the near ptr operands produce 16 bit near pointers. Those with the far ptr
operands produce 32 bit far pointers. MASM 6.x ignores the base type supplied after the
near ptr or far ptr. However
CodeView uses the base type to display the object a pointer
refers to in its correct format.
Note that you can use any type as the base type for a pointer. As the last example above demonstrates you can even define a pointer to another pointer (a handle). CodeView would properly display the object a variable of type intHandle points at as an address.
With the above types you can now generate pointer variables as follows:
bytestr nbytptr ? bytestr2 fbytptr ? CurrentColor colorsptr ? CurrentItem wptr ? LastInt intptr ?
Of course you can initialize these pointers at assembly time if you know where they are going to point when the program first starts running. For example you could initialize the bytestr variable above with the offset of MyString using the following declaration:
bytestr nbytptr MyString
Chapter Five: Variables and Data
Structures (Part 1)
26 SEP 1996