The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Fourteen (Part 4)

Table of Content

Chapter Fourteen (Part 6) 

CHAPTER FOURTEEN:
FLOATING POINT ARITHMETIC (Part 5)
14.4.7 - Comparison Instructions
14.4.7.1 - The FCOM FCOMP and FCOMPP Instructions
14.4.7.2 - The FUCOM FUCOMP and FUCOMPP Instructions
14.4.7.3 - The FTST Instruction
14.4.7.4 - The FXAM Instruction
14.4.8 - Constant Instructions
14.4.9 - Transcendental Instructions
14.4.9.1 - The F2XM1 Instruction
14.4.9.2 - The FSIN FCOS and FSINCOS Instructions
14.4.9.3 - The FPTAN Instruction
14.4.9.4 - The FPATAN Instruction
14.4.9.5 - The FYL2X and FYL2XP1 Instructions
14.4.10 - Miscellaneous instructions
14.4.10.1 - The FINIT and FNINIT Instructions
14.4.10.2 - The FWAIT Instruction
14.4.10.3 - The FLDCW and FSTCW Instructions
14.4.10.4 - The FCLEX and FNCLEX Instructions
14.4.10.5 - The FLDENV FSTENV and FNSTENV Instructions
14.4.10.6 - The FSAVE FNSAVE and FRSTOR Instructions
14.4.10.7 - The FSTSW and FNSTSW Instructions
14.4.10.8 - The FINCSTP and FDECSTP Instructions
14.4.10.9 - The FNOP Instruction
14.4.10.10 - The FFREE Instruction
14.4.11 - Integer Operations

14.4.7 Comparison Instructions

The 80x87 provides several instructions for comparing real values. The fcom fcomp fcompp fucom fucomp and fucompp instructions compare the two values on the top of stack and set the condition codes appropriately. The ftst instruction compares the value on the top of stack with zero. The fxam instrution checks the value on tos and reports sign normalization and tag information.

Generally most programs test the condition code bits immediately after a comparison. Unfortunately there are no conditional jump instructions that branch based on the FPU condition codes. Instead you can use the fstsw instruction to copy the floating point status register (see "The FPU Status Register") into the ax register; then you can use the sahf instruction to copy the ah register into the 80x86's condition code bits. After doing this you can can use the conditional jump instructions to test some condition. This technique copies C0 into the carry flag C2 into the parity flag and C3 into the zero flag. The sahf instruction does not copy C1 into any of the 80x86's flag bits.

Since the sahf instruction does not copy any 80x87 processor status bits into the sign or overflow flags you cannot use the jg jl jge or jle instructions. Instead use the ja jae jb jbe je and jz instructions when testing the results of a floating point comparison. Yes these conditional jumps normally test unsigned values and floating point numbers are signed values. However use the unsigned conditional branches anyway; the fstsw and sahf instructions set the 80x86 flags register to use the unsigned jumps.

14.4.7.1 The FCOM FCOMP and FCOMPP Instructions

The fcom fcomp and fcompp instructions compare st(0) to the specified operand and set the corresponding 80x87 condition code bits based on the result of the comparison. The legal forms for these instructions are

                fcom
fcomp
fcompp

fcom    st(i)
fcomp   st(i)

fcom    mem
fcomp   mem

With no operands fcom fcomp and fcompp compare st(0) against st(1) and set the processor flags accordingly. In addition fcomp pops st(0) off the stack and fcompp pops both st(0) and st(1) off the stack.

With a single register operand fcom and fcomp compare st(0) against the specified register. Fcomp also pops st(0) after the comparison.

With a 32 or 64 bit memory operand the fcom and fcomp instructions convert the memory variable to an 80 bit extended precision value and then compare st(0) against this value setting the condition code bits accordingly. Fcomp also pops st(0) after the comparison.

These instructions set C2 (which winds up in the parity flag) if the two operands are not comparable (e.g. NaN). If it is possible for an illegal floating point value to wind up in a comparison you should check the parity flag for an error before checking the desired condition.

These instructions set the stack fault bit if there aren't two items on the top of the register stack. They set the denormalized exception bit if either or both operands are denormalized. They set the invalid operation flag if either or both operands are quite NaNs. These instructions always clear the C1 condition code.

14.4.7.2 The FUCOM FUCOMP and FUCOMPP Instructions

These instructions are similar to the fcom fcomp and fcompp instructions although they only allow the following forms:

                fucom
fucomp
fucompp
fucom   st(i)
fucomp  st(i)

The difference between fcom/fcomp/fcompp and fucom/fucomp/fucompp is relatively minor. The fcom/fcomp/fcompp instructions set the invalid operation exception bit if you compare two NaNs. The fucom/fucomp/fucompp instructions do not. In all other cases these two sets of instructions behave identically.

14.4.7.3 The FTST Instruction

The ftst instruction compares the value in st(0) against 0.0. It behaves just like the fcom instruction would if st(1) contained 0.0. Note that this instruction does not differentiate -0.0 from +0.0. If the value in st(0) is either of these values ftst will set C3 to denote equality. If you need to differentiate -0.0 from +0.0 use the fxam instruction. Note that this instruction does not pop st(0) off the stack.

14.4.7.4 The FXAM Instruction

The fxam instruction examines the value in st(0) and reports the results in the condition code bits (see "The FPU Status Register" for details on how fxam sets these bits). This instruction does not pop st(0) off the stack.

14.4.8 Constant Instructions

The 80x87 FPU provides several instructions that let you load commonly used constants onto the FPU's register stack. These instructions set the stack fault invalid operation and C1 flags if a stack overflow occurs; they do not otherwise affect the FPU flags. The specific instructions in this category include:

                fldz    ;Pushes +0.0.
fld1    ;Pushes +1.0.
fldp    ;Pushes p.
fldl2t  ;Pushes log2(10).
fldl2e  ;Pushes log2(e).
fldlg2  ;Pushes log10(2).
fldln2  ;Pushes ln(2).
14.4.9 Transcendental Instructions

The 80387 and later FPUs provide eight transcendental (log and trigonometric) instructions to compute a partial tangent partial arctangent 2x-1 y * log2(x) and y * log2(x+1). Using various algebraic identities it is easy to compute most of the other common transcendental functions using these instructions.

14.4.9.1 The F2XM1 Instruction

F2xm1 computes 2st(0)-1. The value in st(0) must be in the range -1.0 st(0) +1.0. If st(0) is out of range f2xm1 generates an undefined result but raises no exceptions. The computed value replaces the value in st(0). Example:

; Compute 10x using the identity: 10x = 2x*lg(10) (lg = log2).

fld     x
fldl2t
fmul
f2xm1
fld1
fadd

Note that f2xm1 computes (2**x) - 1 which is why the code above adds 1.0 to the result at the end of the computation.

14.4.9.2 The FSIN FCOS and FSINCOS Instructions

These instructions pop the value off the top of the register stack and compute the sine cosine or both and push the result(s) back onto the stack. The fsincos pushes the sine followed by the cosine of the original operand hence it leaves cos(st(0)) in st(0) and sin(st(0)) in st(1).

These instructions assume st(0) specifies an angle in radians and this angle must be in the range -2**63 < st(0) < +2**63. If the original operand is out of range these instructions set the C2 flag and leave st(0) unchanged. You can use the fprem1 instruction with a divisor of 2pi to reduce the operand to a reasonable range.

These instructions set the stack fault/C1 precision underflow denormalized and invalid operation flags according to the result of the computation.

14.4.9.3 The FPTAN Instruction

Fptan computes the tangent of st(0) and pushes this value and then it pushes 1.0 onto the stack. Like the fsin and fcos instructions the value of st(0) is assumed to be in radians and must be in the range -2**63<st(0)<+2**63. If the value is outside this range fptan sets C2 to indicate that the conversion did not take place. As with the fsin fcos and fsincos instructions you can use the fprem1 instruction to reduce this operand to a reasonable range using a divisor of 2pi.

If the argument is invalid (i.e. zero or pi radians which causes a division by zero) the result is undefined and this instruction raises no exceptions. Fptan will set the stack fault precision underflow denormal invalid operation C2 and C1 bits as required by the operation.

14.4.9.4 The FPATAN Instruction

This instruction expects two values on the top of stack. It pops them and computes the following:

st(0) = tan-1( st(1) / st(0) )

The resulting value is the arctangent of the ratio on the stack expressed in radians. If you have a value you wish to compute the tangent of use fld1 to create the appropriate ratio and then execute the fpatan instruction.

This instruction affects the stack fault/C1 precision underflow denormal and invalid operation bits if an problem occurs during the computation. It sets the C1 condition code bit if it has to round the result.

14.4.9.5 The FYL2X and FYL2XP1 Instructions

The fyl2x and fyl2xp1 instructions compute st(1) * log2(st(0)) and st(1) * log2(st(0)+1) respectively. Fyl2x requires that st(0) be greater than zero fyl2xp1 requires st(0) to be in the range:

Fyl2x is useful for computing logs to bases other than two; fyl2xp1 is useful for computing compound interest maintaining the maximum precision during computation.

Fyl2x can affect all the exception flags. C1 denotes rounding if there is not other error stack overflow/underflow if the stack fault bit is set.

The fyl2xp1 instruction does not affect the overflow or zero divide exception flags. These exceptions occur when st(0) is very small or zero. Since fyl2xp1 adds one to st(0) before computing the function this condition never holds. Fyl2xp1 affects the other flags in a manner identical to fyl2x.

14.4.10 Miscellaneous instructions

The 80x87 FPU includes several additional instructions which control the FPU synchronize operations and let you test or set various status bits. These instructions include finit/fninit fdisi/fndisi feni/fneni fldcw fstcw/fnstcw fclex/fnclex fsave/fnsave frstor frstpm fstsw/fnstsw fstenv/fnstenv fldenv fincstp fdecstp fwait fnop and ffree. The fdisi/fndisi feni/fneni and frstpm are active only on FPUs earlier than the 80387 so we will not consider them here.

Many of these instructions have two forms. The first form is Fxxxx and the second form is FNxxxx. The version without the "N" emits an fwait instruction prior to opcode (which is standard for most coprocessor instructions). The version with the "N" does not emit the fwait opcode ("N" stands for no wait).

14.4.10.1 The FINIT and FNINIT Instructions

The finit instruction intializes the FPU for proper operation. Your applications should execute this instruction before executing any other FPU instructions. This instruction initializes the control register to 37Fh (see "The FPU Control Register") the status register to zero (see "The FPU Status Register") and the tag word to 0FFFFh. The other registers are unaffected.

14.4.10.2 The FWAIT Instruction

The fwait instruction pauses the system until any currently executing FPU instruction completes. This is required because the FPU on the 80486sx and earlier CPU/FPU combinations can execute instructions in parallel with the CPU. Therefore any FPU instruction which reads or writes memory could suffer from a data hazard if the main CPU accesses that same memory location before the FPU reads or writes that location. The fwait instruction lets you synchronize the operation of the FPU by waiting until the completion of the current FPU instruction. This resolves the data hazard by effectively inserting an explict "stall" into the execution stream.

14.4.10.3 The FLDCW and FSTCW Instructions

The fldcw and fstcw instructions require a single 16 bit memory operand:

                fldcw   mem_16
fstcw   mem_16

These two instructions load the control register (see "The FPU Control Register") from a memory location (fldcw) or store the control word to a 16 bit memory location (fstcw).

When using the fldcw instruction to turn on one of the exceptions if the corresponding exception flag is set when you enable that exception the FPU will generate an immediate interrupt before the CPU executes the next instruction. Therefore you should use the fclex instruction to clear any pending interrupts before changing the FPU exception enable bits.

14.4.10.4 The FCLEX and FNCLEX Instructions

The fclex and fnclex instructions clear all exception bits the stack fault bit and the busy flag in the FPU status register (see "The FPU Status Register").

14.4.10.5 The FLDENV FSTENV and FNSTENV Instructions

                fstenv  mem_14b
fnstenv mem_14b
fldenv  mem_14b

The fstenv/fnstenv instructions store a 14-byte FPU environment record to the memory operand specified. When operating in real mode (the only mode this text considers) the environment record takes the form:

You must execute the fstenv and fnstenv instructions with the CPU interrupts disabled. Furthermore you should always ensure that the FPU is not busy before executing this instruction. This is easily accomplished by using the following code:

                pushf           ;Preserve I flag.
cli             ;Disable interrupts.
fstenv  mem_14b ;Implicit wait for not busy.
fwait           ;Wait for operation to finish.
popf            ;Restore I flag.

The fldenv instruction loads the FPU environment from the specified memory operand. Note that this instruction lets you load the the status word. There is no explicit instruction like fldcw to accomplish this.

14.4.10.6 The FSAVE FNSAVE and FRSTOR Instructions

                fsave   mem_94b
fnsave  mem_94b
frstor  mem_94b

These instructions save and restore the state of the FPU. This includes saving all the internal control status and data registers. The destination location for fsave/fnsave (source location for frstor) must be 94 bytes long. The first 14 bytes correspond to the environment record the fldenv and fstenv instructions use; the remaining 80 bytes hold the data from the FPU register stack written out as st(0) through st(7). Frstor reloads the environment record and floating point registers from the specified memory operand.

The fsave/fnsave and frstor instructions are mainly intended for task switching. You can also use fsave/fnsave and frstor as a "push all" and "pop all" sequence to preserve the state of the FPU.

Like the fstenv and fldenv instructions interrupts should be disabled while saving or restoring the FPU state. Otherwise another interrupt service routine could manipulate the FPU registers and invalidate the operation of the fsave/fnsave or frestore operation. The following code properly protects the environment data while saving and restore the FPU status:

; Preserve the FPU state
assume di points at the environment
; record in memory.

pushf
cli
fsave   [si]
fwait
popf
.
.
.
pushf
cli
frstor  [si]
fwait
popf
14.4.10.7 The FSTSW and FNSTSW Instructions
                fstsw   ax
fnstsw  ax
fstsw   mem_16
fnstsw  mem_16

These instructions store the FPU status register (see "The FPU Status Register") into a 16 bit memory location or the ax register. These instructions are unusual in the sense that they can copy an FPU value into one of the 80x86 general purpose registers. Of course the whole purpose behind allowing the transfer of the status register into ax is to allow the CPU to easily test the condition code register with the sahf instruction.

14.4.10.8 The FINCSTP and FDECSTP Instructions

The fincstp and fdecstp instructions do not take any operands. They simply increment and decrement the stack pointer bits (mod 8) in the FPU status register. These two instructions clear the C1 flag but do not otherwise affect the condition code bits in the FPU status register.

14.4.10.9 The FNOP Instruction

The fnop instruction is simply an alias for fst st st(0). It performs no other operation on the FPU.

14.4.10.10 The FFREE Instruction

                ffree   st(i)

This instruction modifies the tag bits for register i in the tags register to mark the specified register as emtpy. The value is unaffected by this instruction but the FPU will no longer be able to access that data (without resetting the appropriate tag bits).

14.4.11 Integer Operations

The 80x87 FPUs provide special instructions that combine integer to extended precision conversion along with various arithmetic and comparison operations. These instructions are the following:

                fiadd   int
fisub   int
fisubr  int
fimul   int
fidiv   int
fidivr  int

ficom   int
ficomp  int

These instructions convert their 16 or 32 bit integer operands to an 80 bit extended precision floating point value and then use this value as the source operand for the specified operation. These instructions use st(0) as the destination operand.

Chapter Fourteen (Part 4)

Table of Content

Chapter Fourteen (Part 6) 

Chapter Fourteen: Floating Point Arithmetics (Part 5)
28 SEP 1996