The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Thirteen (Part 7)

Table of Content

Chapter Thirteen (Part 9) 

CHAPTER THIRTEEN:
MS-DOS PC-BIOS AND FILE I/O (Part 8)
13.3.10 - Blocked File I/O
13.3.11 - The Program Segment Prefix (PSP)

13.3.10 Blocked File I/O

The examples in the previous section suffer from a major drawback they are extremely slow. The performance problems with the code above are entirely due to DOS. Making a DOS call is not shall we say the fastest operation in the world. Calling DOS every time we want to read or write a single character from/to a file will bring the system to its knees. As it turns out it doesn't take (practically) any more time to have DOS read or write two characters than it does to read or write one character. Since the amount of time we (usually) spend processing the data is negligible compared to the amount of time DOS takes to return or write the data reading two characters at a time will essentially double the speed of the program. If reading two characters doubles the processing speed how about reading four characters? Sure enough it almost quadruples the processing speed. Likewise processing ten characters at a time almost increases the processing speed by an order of magnitude. Alas this progression doesn't continue forever. There comes a point of diminishing returns- when it takes far too much memory to justify a (very) small improvement in performance (keeping in mind that reading 64K in a single operation requires a 64K memory buffer to hold the data). A good compromise is 256 or 512 bytes. Reading more data doesn't really improve the performance much yet a 256 or 512 byte buffer is easier to deal with than larger buffers.

Reading data in groups or blocks is called blocked I/O. Blocked I/O is often one to two orders of magnitude faster than single character I/O so obviously you should use blocked I/O whenever possible.

There is one minor drawback to blocked I/O-- it's a little more complex to program than single character I/O. Consider the example presented in the section on the DOS read command:

Example: This example opens a file and reads it to the EOF

                mov     ah
3dh         ;Open the file
mov     al
0           ;Open for reading
lea     dx
Filename    ;Presume DS points at filename
int     21h             ; segment
jc      BadOpen
mov     FHndl
ax       ;Save file handle

LP:             mov     ah
3fh          ;Read data from the file
lea     dx
Buffer      ;Address of data buffer
mov     cx
1           ;Read one byte
mov     bx
FHndl       ;Get file handle value
int     21h
jc      ReadError
cmp     ax
cx          ;EOF reached?
jne     EOF
mov     al
Buffer      ;Get character read
putc                    ;Print it (IOSHELL call)
jmp     LP              ;Read next byte

EOF:            mov     bx
FHndl
mov     ah
3eh         ;Close file
int     21h
jc      CloseError

There isn't much to this program at all. Now consider the same example rewritten to use blocked I/O:

Example: This example opens a file and reads it to the EOF using blocked I/O

                mov     ah
3dh         ;Open the file
mov     al
0           ;Open for reading
lea     dx
Filename    ;Presume DS points at filename
int     21h             ; segment
jc      BadOpen
mov     FHndl
ax       ;Save file handle

LP:             mov     ah
3fh          ;Read data from the file
lea     dx
Buffer      ;Address of data buffer
mov     cx
256         ;Read 256 bytes
mov     bx
FHndl       ;Get file handle value
int     21h
jc      ReadError
cmp     ax
cx          ;EOF reached?
jne     EOF
mov     si
0           ;Note: CX=256 at this point.
PrtLp:          mov     al
Buffer[si]  ;Get character read
putc                    ;Print it
inc     si
loop    PrtLp
jmp     LP              ;Read next block

; Note
just because the number of bytes read doesn't equal 256

; don't get the idea we're through
there could be up to 255 bytes
; in the buffer still waiting to be processed.

EOF:            mov     cx
ax
jcxz    EOF2            ;If CX is zero
we're really done.
mov     si
0           ;Process the last block of data read
Finis:          mov     al
Buffer[si]  ; from the file which contains
putc                    ; 1..255 bytes of valid data.
inc     si
loop    Finis

EOF2:           mov     bx
FHndl
mov     ah
3eh         ;Close file
int     21h
jc      CloseError

This example demonstrates one major hassle with blocked I/O - when you reach the end of file you haven't necessarily processed all of the data in the file. If the block size is 256 and there are 255 bytes left in the file DOS will return an EOF condition (the number of bytes read don't match the request). In this case we've still got to process the characters that were read. The code above does this in a rather straight-forward manner using a second loop to finish up when the EOF is reached. You've probably noticed that the two print loops are virtually identical. This program can be reduced in size somewhat using the following code which is only a little more complex:

Example: This example opens a file and reads it to the EOF using blocked I/O

                mov     ah
3dh         ;Open the file
mov     al
0           ;Open for reading
lea     dx
Filename    ;Presume DS points at filename
int     21h             ; segment.
jc      BadOpen
mov     FHndl
ax       ;Save file handle

LP:             mov     ah
3fh          ;Read data from the file
lea     dx
Buffer      ;Address of data buffer
mov     cx
256          ;Read 256 bytes
mov     bx
FHndl       ;Get file handle value
int     21h
jc      ReadError
mov     bx
ax          ;Save for later
mov     cx
ax
jcxz    EOF
mov     si
0           ;Note: CX=256 at this point.
PrtLp:          mov     al
Buffer[si]  ;Get character read
putc                    ;Print it
inc     si
loop    PrtLp
cmp     bx
256         ;Reach EOF yet?
je      LP

EOF:            mov     bx
FHndl
mov     ah
3eh         ;Close file
int     21h
jc      CloseError

Blocked I/O works best on sequential files. That is those files opened only for reading or writing (no seeking). When dealing with random access files you should read or write whole records at one time using the DOS read/write commands to process the whole record. This is still considerably faster than manipulating the data one byte at a time.

13.3.11 The Program Segment Prefix (PSP)

When a program is loaded into memory for execution DOS first builds up a program segment prefix immediately before the program is loaded into memory. This PSP contains lots of information some of it useful some of it obsolete. Understanding the layout of the PSP is essential for programmers designing assembly language programs.

The PSP is 256 bytes long and contains the following information:

Offset  Length  Description
0       2       An INT 20h instruction is stored here
2       2       Program ending address
4       1       Unused
reserved by DOS
5       5       Call to DOS function dispatcher
0Ah     4       Address of program termination code
0Eh     4       Address of break handler routine
12h     4       Address of critical error handler routine
16h     22      Reserved for use by DOS
2Ch     2       Segment address of environment area
2Eh     34      Reserved by DOS
50h     3       INT 21h
RETF instructions
53h     9       Reserved by DOS
5Ch     16      Default FCB #1
6Ch     20      Default FCB #2
80h     1       Length of command line string
81h     127     Command line string

Note: locations 80h..FFh are used for the default DTA.

Most of the information in the PSP is of little use to a modern MS-DOS assembly language program. Buried in the PSP however are a couple of gems that are worth knowing about. Just for completeness however we'll take a look at all of the fields in the PSP.

The first field in the PSP contains an int 20h instruction. Int 20h is an obsolete mechanism used to terminate program execution. Back in the early days of DOS v1.0 your program would execute a jmp to this location in order to terminate. Nowadays of course we have DOS function 4Ch which is much easier (and safer) than jumping to location zero in the PSP. Therefore this field is obsolete.

Field number two contains a value which points at the last paragraph allocated to your program By subtracting the address of the PSP from this value you can determine the amount of memory allocated to your program (and quit if there is insufficient memory available).

The third field is the first of many "holes" left in the PSP by Microsoft. Why they're here is anyone's guess.

The fourth field is a call to the DOS function dispatcher. The purpose of this (now obsolete) DOS calling mechanism was to allow some additional compatibility with CP/M-80 programs. For modern DOS programs there is absolutely no need to worry about this field.

The next three fields are used to store special addresses during the execution of a program. These fields contain the default terminate vector break vector and critical error handler vectors. These are the values normally stored in the interrupt vectors for int 22h int 23h and int 24h. By storing a copy of the values in the vectors for these interrupts you can change these vectors so that they point into your own code. When your program terminates DOS restores those three vectors from these three fields in the PSP. For more details on these interrupt vectors please consult the DOS technical reference manual.

The eighth field in the PSP record is another reserved field currently unavailable for use by your programs.

The ninth field is another real gem. It's the address of the environment strings area. This is a two-byte pointer which contains the segment address of the environment storage area. The environment strings always begin with an offset zero within this segment. The environment string area consists of a sequence of zero-terminated strings. It uses the following format:

string1 0 string2 0 string3 0 ... 0 stringn 0 0

That is the environment area consists of a list of zero terminated strings the list itself being terminated by a string of length zero (i.e. a zero all by itself or two zeros in a row however you want to look at it). Strings are (usually) placed in the environment area via DOS commands like PATH SET etc. Generally a string in the environment area takes the form

 		name = parameters

For example the "SET IPATH=C:\ASSEMBLY\INCLUDE" command copies the string "IPATH=C:\ASSEMBLY\INCLUDE" into the environment string storage area.

Many languages scan the environment storage area to find default filename paths and other pieces of default information set up by DOS. Your programs can take advantage of this as well.

The next field in the PSP is another block of reserved storage currently undefined by DOS.

The 11th field in the PSP is another call to the DOS function dispatcher. Why this call exists (when the one at location 5 in the PSP already exists and nobody really uses either mechanism to call DOS) is an interesting question. In general this field should be ignored by your programs.

The 12th field is another block of unused bytes in the PSP which should be ignored.

The 13th and 14th fields in the PSP are the default FCBs (File Control Blocks). File control blocks are another archaic data structure carried over from CP/M-80. FCBs are used only with the obsolete DOS v1.0 file handling routines so they are of little interest to us. We'll ignore these FCBs in the PSP.

Locations 80h through the end of the PSP contain a very important piece of information- the command line parameters typed on the DOS command line along with your program's name. If the following is typed on the DOS command line:

		MYPGM parameter1
parameter2

the following is stored into the command line parameter field:

		23
" parameter1
parameter2"
0Dh

Location 80h contains 2310 the length of the parameters following the program name. Locations 81h through 97h contain the characters making up the parameter string. Location 98h contains a carriage return. Notice that the carriage return character is not figured into the length of the command line string.

Processing the command line string is such an important facet of assembly language programming that this process will be discussed in detail in the next section.

Locations 80h..FFh in the PSP also comprise the default DTA. Therefore if you don't use DOS function 1Ah to change the DTA and you execute a FIND FIRST FILE the filename information will be stored starting at location 80h in the PSP.

One important detail we've omitted until now is exactly how you access data in the PSP. Although the PSP is loaded into memory immediately before your program that doesn't necessarily mean that it appears 100h bytes before your code. Your data segments may have been loaded into memory before your code segments thereby invalidating this method of locating the PSP. The segment address of the PSP is passed to your program in the ds register. To store the PSP address away in your data segment your programs should begin with the following code:

                push    ds              ;Save PSP value
mov     ax
seg DSEG    ;Point DS and ES at our data
mov     ds
ax                  ; segment.
mov     es
ax
pop     PSP             ;Store PSP value into "PSP"
; variable.
.
.
.

Another way to obtain the PSP address in DOS 5.0 and later is to make a DOS call. If you load ah with 51h and execute an int 21h instruction MS-DOS will return the segment address of the current PSP in the bx register.

There are lots of tricky things you can do with the data in the PSP. Peter Norton's Programmer's Guide to the IBM PC lists all kinds of tricks. Such operations won't be discussed here because they're a little beyond the scope of this manual.

Chapter Thirteen (Part 7)

Table of Content

Chapter Thirteen (Part 9) 

Chapter Thirteen: MS-DOS PC-BIOS and File I/O (Part 8)
28 SEP 1996