Objects First


Pointers

Pointers are perhaps the most difficult concept to understand in C. When struggling with them, you can take some encouragement from knowing that misuse of pointers is probably one of the largest sources of bugs in C programs, so that even experienced programmers have trouble with them. Although some purists would like to purge them from the language altogether (as has been done in Java), they are the key to efficient C programs (and unfortunately all of us would like our programs to finish before ... insert morning tea, lunch, dinner or other important daily event for you! and thus are attracted to fast-running programs). Although you can possibly write useful C programs for yourself without using pointers, (but note that that immediately excludes strings!) you won't be able to understand many C programs written by others if you can't understand the pointer concepts well. Look at the pattern we have been using for defining a class,
typedef struct classx *Classx;
and observe the * preceding the name of the class. This tells us that Classx is a pointer to something of type struct classx. (We have been avoiding considering struct type declarations in a formal way, if you're curious, follow this link to a description of struct's - or read your textbook!)

A pointer's value is the address in the machine's memory of the object that the pointer is pointing at. We can declare pointers to any type of object by adding a * before the name of each variable in a declaration. Note that in the definition:

int *x, y;
x is a pointer to an int, but y is an int. If we want both to be pointers, we must write:
int *x, *y;
Some programmers will adopt a programming style rule which says that puts only pointers or only objects in any one declaration. However, if you're defining a list of objects and variables that are going to be pointers to each object, then an alternating declaration may be the best thing:
int x, *xp, y, *yp, z, *zp;
where xp, yp and zp are intended to be used as pointers to x, y and z.

We can define a pointer to any C object: int, char, double, struct, ... and even to another pointer!:

int *ip, *jp;
char *s;
double *dp;
struct xy *xyp;
int **ipp;
The last is a pointer to a pointer which points to an int! (More on pointers to pointers later.)

Accessing the referenced value

Pointers are also called references to other objects - they don't contain useful values themselves, but provide a reference to (address of) a memory location (or locations) that do contain values of interest. If we want to access the values to which pointers point, we use the de-referencing operator, *.
int *ip, *jp;
*ip = 1;              /* Store 1 in the location
                         to which ip points */
x = (*ip) + (*jp);    /* Add the values in the locations
                         pointed to by ip and jp */
In general, if we define:
some_type *ap;
then we can use the dereferenced pointer, *ap, anywhere that we could use an object of the some_type class.
int *ip;
ip = (int *)malloc( sizeof(int) ); /* initialise ip */
*ip = some_integer_function( 1, 2 );
x = (*ip) * (*ip);
y = some_integer_function( *ip, 3 );
Needless to say, the form in the previous paragraph - with parentheses - is safer and should be preferred because it leaves no possibility for a reader to be confused as to the intended behaviour of the code in which it appears.

Precedence Rule Note

You can consider the dereferencing operator, *, as simply another operator: it acts on a pointer to convert it to the value of the object to which it points (or which it references). It happens to be one of the highest precedence operators (there is only one group with higher precedence), so that although I have been careful to add parentheses in the examples above, most C programmers would not bother and just write:
x = *ip * *ip;

Initialising Pointers

Like any other variable, a pointer must be initialised before use. In the case of an "ordinary" variable, one assigns a value to it, in the case of a pointer, one assigns the address of some other object to it. We saw how to do this with strings in the previous section: since all string constants are equivalent to pointers to the memory where the string is stored, assignment of a string to a char * variable is natural.
char *cp;
cp = "Some very long and involved string";

The "address of" operator

The unary operator & generates the address of its operand:
int x, *xp;
xp = &x;
&x is the address of x and thus can be used to initialise an int *. & can be used both in an assignment statement and in the initialisation of a pointer variable:
double x;
double *xp = &x;
int p;
int *pp = &p;

Dynamic allocation

Another way of initialising a pointer is to dynamically allocate memory for the object. The C standard library provides some functions which do this: the most commonly used are malloc and calloc.
#include <stdlib.h>

void *malloc( int n );
void *calloc( int n, int size );
Note that these functions return pointers to void's. This means that the pointer they return can be assigned to any pointer without generating a type error. malloc allocates n bytes of space and returns a pointer to the allocated space. calloc allocates n*size bytes, clears them to 0 and returns a pointer to the allocated, zeroed space. (Read calloc as clear and allocate - or more precisely, clear after allocation.) calloc was designed to be used for arrays: one parameter is the number of elements and the other is the size of each element, but we can always call malloc( n*size ) if we don't need the space to be cleared.
Because pointers are addresses, all pointers have the same format and use the same amount of space (4 bytes on a typical 32-bit machine, 8 bytes on DEC's Alpha and other 64-bit machines). Thus all pointers are equivalent as far as the machine is concerned. However, once we've allocated space for an object with malloc or calloc, it's beneficial to keep them logically independent and class them by the type of object that they address. A good compiler will warn you if you try to assign a pointer to an object of type A to a pointer to an object of type B.

calloc's general behaviour is the same as that of malloc, so we will omit explicit mention of it from now on. However wherever malloc may be used, calloc may also be used with the same effect - except that malloc doesn't alter the current contents of the allocated space - it may be garbage left in memory by the last program! - whereas calloc clears it.

malloc requests memory for your program from an area of memory allocated by the linker and the operating system for dynamic memory allocation requests. This area is usually called the heap. Initially, it's just a "heap" of memory which has no allocated purpose - until malloc requests some of it.

Errors using pointers

All variables need initialisation before use, pointers are no different. However, it seems that it's somehow easier to forget to initialise a pointer, as a great many of the errors in C programs (even those produced by programmers who claim some experience!) are traceable to uninitialised pointers. In the object oriented style of programming promoted here, the pointer which acts as the handle for an object is initialised in the constructor, so as long as you've remembered to construct new objects by calling the constructor, the pointer will be correctly initialised. However, except for exhorting you not to forget to call the constructor, there's nothing else I can do to make sure that you do remember to call it! (Languages with formal support for the OO programming model, such as C++ and Java, always call constructors - a default one if the programmer doesn't explicitly nominate one - when a variable is declared, thus eliminating the need for programmers to remember to do it.)
Don't forget to initialise pointers!
Use either
  • malloc or
  • the "address of" operator, &.

Returning more than one value from a function

Often we'd like to write functions which return more than a single method: in our Date class, we would like a method which unpacks a date object into day, month and year. (You can bring up an example specification and partial implementation of a Date class.) Pointers allow us to do this: we can add to the Date class a method whose specification is:
void unpack( Date d, int *year, int *month, int *day );
Since the parameters year, month and day are now pointers, they pass the address of a variable to the method. The method can now write a value into that address where it will be seen by the calling program.
#include "Date.h"   /* Import the class specification,
                       now includding the unpack method */

void f( Date d ) {
  int day, mon, year;

  unpack( d, &day, &mon, &year );

  printf("The date is %d:%d:%d\n", day, mon, 1900+year );
  /* Candidate for Year 2000 bug ;-) */
Normally the variables in the invoking program are hidden from the invoked function and copies of their values are passed to the function. Note that in this case, we passed the address of day, month and year. This allows the unpack method to write answers in the memory locations which the function f is using.

The implementation of unpack will look like:

void unpack( Date d, int *day, int *mon, int *year ) {
  *day = d->day;
  *mon = d->month;
  *year = d->year;
  }
unpack could also return a further value as the value of the function itself (eg it could return TRUE if d was a legal date and FALSE otherwise, or it could use the return value to flag that d was a candidate for the "2000 bug"!

Key terms

de-referencing
Using a pointer to obtain the value of the object to which it points. This term is natural if a pointer is considered as a reference to another object rather than the object itself.
heap
An area of memory allocated by an operating system to a program from which memory space is dynamically allocated to data structures needed by a running program.

Continue on to Input and Output Functions
See also Advanced C programming: Pointer arithmetic
Back to the Table of Contents
© John Morris, 1998