Objects First


Floating Point Classes

Machine Representations

Up until this point, we have deliberately said nothing about the way data is stored in a computer. This was a deliberate strategy to allow you to concentrate on logical aspects of computer program design. However, in order to understand the floating point classes provided by C implementations ( float and double), we need to look at how typical machines store data.
Integers
Modern day computers store data as sequences of binary bits organised into bytes and words.

A byte is the smallest directly addressable unit of memory in most computers and usually consists of 8 binary bits. A byte is sufficient to store one character encoded using the ASCII codes. (ASCII codes range from 0-255 or 0 - 28-1, requiring 8 binary bits.) Processors are very commonly classified by the basic word length, eg 8-, 16-, 32- or (most recently) 64-bit machines. Today, the most common machines are 32-bit ones. This means that the basic 'word' of the machine, used for storing basic numeric quantities has 32 bits.

If this word is used to store an unsigned integer, then the values stored can range from 0 to 232-1 (0 to 4294967295). C allows us to qualify int and char types to be unsigned. Simply write the keyword unsigned in front of the int or char:

unsigned int k, l, m;
unsigned char a, b, c;
If unsigned is used without the basic type name, then C assumes you meant int. Thus:
unsigned k, l, m;
is legal C and is identical to the first line of the previous example.

The int type is signed, ie values can be positive or negative. On a 32-bit machine, an int variable will usually require 32-bits of storage and use the 2's complement representation for signed numbers. This implies a range of values from -231 to +231-1.

Characters occupy 8-bit bytes. The are 4 8-bit bytes in a 32-bit word. Since most modern machines are byte-addressable, ie each byte of memory is individually addressable. An int variable requires 32-bits or 4 bytes. Thus addresses of successive int variables differ by 4.

Implementation limits

Technology is changing: we are moving towards machines with 64-bit words. Already some machines use a 64-bit word for int. To cope with this, the ANSI standard requires that characterics of any implementation be specified in limits.h. This contains such as INT_MAX which is set to the largest int on your machine (typically +2147483647 or 231-1). Consult a textbook for the full list of implementation specific constants.

Floating Point

Used for representing all non-integral or real values.

IEEE 754

Before the establishment of this standard, machine manufacturers were free to implement floating point variables in whatever way they pleased. IEEE 754 defines formats for 32- and 64-bit floating point values. Consult an architecture text for the full story, but for general programming purposes:

Size
(bits)
32
64
Usual C
implementation
float
double
Sign
(bits)
1
1
Exponent
(bits)
8
11
Significand
/Mantissa
(bits)
23
52
Range
+/-1038
+/-10308
Smallest absolute
value
10-38
10-308
Precision
1 in 106
1 in 1015

Note that because machines don't always implement the IEEE 754 standard (eg IBM mainframes, DEC VAX), the range and precision of C float and double may vary!

Floating point constants

The following railroad diagram defines legal floating point constants:

From this, it is easily seen that the following are all acceptable floating point constants:

1.      1.2      1e10      1.2e10
.002    1e+1     1.e-10    12.e12

Floating point methods (operations)

The usual arithmetic operations, +, -, * and /, are defined for float and double - as for integers.

Relational operators ( <, <=, >, >=, !=, ==) are also defined. However, the use of == and != is likely to cause grief! The representation of a floating point number in binary is usually an approximation. Thus:

double x, y;
x = ... ; y = ....; /* Some code setting values for x and y */
if( x == y ) {
  .... /* This will rarely be executed! */
  }
else {
  .... /* Mostly, this will be executed! */
  }
You should write:
#define EPSILON	1e-10

some_function f( .... ) {
  double x, y;
  x = ...;
  y = ...;
  if ( fabs(x-y) < EPSILON ) {
    .... /* Now this will be executed
            when x and y differ by less than EPSILON */
    }
  else {
    .... /* x and y differ by more than EPSILON */
  }
fabs is defined in <math.h> : it returns the absolute value of a double. Even better, might be not to check for an absolute difference, but to look for a relative difference:
   if ( fabs( (x-y)/x ) < EPSILON ) { ...
Now EPSILON represents a relative tolerance, not an absolute one.

Mixed mode arithmetic

When expressions contain both floating point and integer variables or constants, C promotes the integers to floating point values before evaluation. The detailed rules are quite complex and you should refer to a text when in doubt. However, in general, promotion will take place only when necessary. For example:
int j = 3, k = 4;
double x;
x = j/k;
will result in the value 0 in x. j/k is evaluated as an integral expression (both components are integers) and then converted to a double. However,
int j = 3, k = 4;
double x;
x = j;
x = x/k;
will produce 0.75 as the value of x, because x = j results in 3.0 in x and when x/k is evaluated, k is converted to 4.0 before being used to evaluate the expression.

Generally, if you combine an integer and a floating point value in a term (two values and an operator), then the integer will be promoted to match the type of the floating point value ( float or double). You can use parentheses to determine exactly what happens: eg don't write:

int j, k;
double x, y;
...
y = x/j*k;
use parentheses to make the intent of your code clear:
int j, k;
double x, y;
...
y = (x/j)*k;
has two benefits:

The C standard defines precisely what a compiler should do when there is ambiguity.

However there is no need to remember these arcane rules.
Using parentheses will always produce the intended result!

The precedence tables may be found in any C textbook. The only time you should need to use them is when you have to interpret someone else's code!

And when you've worked out what the original author (hopefully) intended, it's a good idea to insert the appropriate parentheses, so that anyone following you is spared the need to reach for the textbook and study some detailed and uninteresting rules!

Integer overflow

Be careful of integer expressions which are likely to produce large numeric results.
int j, k, m;
j = 100000;
k = 300000;
m = j*k;
printf("m = %d\n", m);
produces
m = -64771072
because the product of j and k is 3x1010, or more than the value of the largest 32-bit int (~2x109).

C provides a long int type, which may or may not use more bits (eg 64 rather than 32). However, the standard is ambiguous on this and allows a compiler implementer some choice.

Thus you must check the actual range for each implementation.

More on printf

We have seen that printf outputs characters from its format string until it encounters % directives. If the character following the % is a legal one, then the next argument is formatted according to some rules associated with that character and output to standard output (or the appropriate stream for fprintf). Here are some more formatting directives:

DirectiveInterpretation Space used
(characters)
Example
%dDecimal integeras many as needed%d123456
%ndDecimal integerAt least n%6d123456
%noOctal integerAt least n%6o123456
%nxHexadecimal integerAt least n%6x12a4fc
%ldDecimal integer
long int
as many as needed%ld123456
%lo, %lxOctal, hexadecimal integer
long int
as many as needed%lx1ac6bf32
%cCharacter1%cA
%fFloatas needed%f1.23456
%w.dfFloat
Field width w spaces
d digits after decimal point
At least w%6.4-1.2345
%eFloat
(exponential form)
as needed %f1.223456e+00
%w.deFloat
(exponential form)
Field width w spaces
d digits after decimal point
At least w%10.4e-1.2345e-01
%w.dgFloat
(choose best form)
Field width w spaces
d digits after decimal point
At least w%10.4e-0.12345
%.dsString
occupying d spaces
At least d %.10sABC123~~~~
(~ indicates a space)

Note the at least entries, printf will use more space than specified if the value won't fit in the allocated space!

The final example which shows how to print out a string so that it uses exactly d character positions (so that you can line up strings in tables) is unusual - note the position of the point - and needs to be remembered!

Key terms

byte
The smallest addressable unit of storage on most computers; 8 binary bits; used to store a single character
byte-addressable
A machine in which the smallest element of memory which can be addressed is a byte - distinguish from word-addressable machines (rare now!)
machine word
The most commonly used "chunk" of storage manipulated by a computer - usually the size of the integer registers. In 1998, most high-performance machines have 32-bit words. Some, like DEC's Alpha, have 64-bit words.
unsigned integer
An integer which can only take positive values; one in which all the bits of the machine word are used to store magnitude information.
2's complement
A convention for the representation of signed integers in a machine word.
mixed mode arithmetic
arithmetic expressions containing different types, eg double and int or double and char.

Continue on to Arrays Back to the Table of Contents
©
John Morris, 1997