Objects First


Advanced C Programming

Pointer Arithmetic

As noted earlier in the introduction to pointers, pointers are a difficult concept in C. Some of this difficulty is created by C programmers who make extensive use of C's pointer arithmetic capabilities.

Look at this implementation of the library memcpy routine:
char *memcpy( char *dest, char *src, int n ) {
  char *cp = dest;
  while( n-- ) {
    *cp++ = *src++;
    }
  return dest;
  }
For a robust version of memcpy, look at this improved version.
First, note that the use of pointer arithmetic avoids the need to introduce an extra variable for the array indices, ie our lazy C programmer doesn't need to type out:

char *memcpy( char *dest, char *src, int n ) {
  int i;
  for(i=0;i<n;i++) {
    dest[i] = src[i];
    }
  return dest;
  }
and is now free to use the
while( n-- )
trick to execute the loop exactly n times.

Secondly, note how the copy operation is coded:

*cp++ = *src++;
This statement makes use of the fact that the pointers, cp and src, are simply memory addresses and may be incremented using the ++ operator just as we apply it to an integer.

This coding may be concise, but it relies on the operator precedence rules, and thus violates a rule suggested earlier ! Since ++ has precedence over *, the abbreviated coding is equivalent to:

*(cp++) = *(src++);
in which the increment is applied to pointers, not the objects to which they point. Thus this terse code has the desired effect: fetch the character from the address in src (incrementing src after it has been used to fetch a character) and store it in the address in cp (incrementing cp after it has been used to store a character). The two pointers move through the source and destination blocks of memory - alternately fetching (storing) a character and incrementing the address to point to the next location.

General pointer arithmetic

Since we can apply the ++ operator to a pointer, we can apply any other arithmetic operation also:
char *start, *end, *mid;
char buffer[N];

start = buffer;
end = start + N - 1;
mid = (start + end)>>1;
generates pointers to the start, end and middle of a buffer.
char *cp;
char buffer[N];

cp = buffer;
while( cp < &buffer[N] ) {
  process_block( cp );
  cp = cp + BLOCK_SIZE;
  }
steps through a buffer BLOCK_SIZE characters at a time.

Array addressing and pointer arithmetic

When we access an element of an array, we have two alternative and equivalent ways to address the element. For some char array,
char a[N];
the (i+1)th element may be accessed with:
c = a[i];
or
c = *(a+i);
In the second case, we use the fact that the name of an array is a pointer to its first element and use pointer arithmetic to advance the pointer to the address i bytes past the "base" address of the array.

Since both methods for accessing array elements are equivalent, they may be used interchangeably. This applies no matter how the array is defined or created. In particular, dynamically allocated arrays can be addressed with the indexed form if it's convenient and helps to make your code easier to understand.
char *buffer, *start;
buffer =
 start = (char *)malloc( N );
/* Initialise the buffer */
for(i=0;i<N;i++)
  buffer[i] = 'A';
...
/* Output the buffer */
for(i=0;i<N;i++)
  putchar( *buffer++ );
/* Finished, release space */
free( start ); 
Note that because we allowed the pointer buffer to "move" through the buffer, we were careful to keep an unmodified pointer, start, to the beginning of the buffer so that we could use it again or free it. Forgetting to keep this pointer to the start of an array is is a common trap for new C programmers! They find that a buffer apparently contains garbage - because they're looking at memory beyond the original buffer.

Pointers to general types and structures

Arithmetic on pointers to objects which are not characters obeys a special rule. If we create an array of structures:
struct abc { int a, b, c; } abc_array[N];
struct abc *abc_pt;
then we can write:
abc_pt = abc_array;
for(i=0;i<N;i++) {
  printf("%d %d %d\n",
    abc_pt->a, abc_pt->b, abc_pt->c );
  abc_pt++;
  }
The increment operation applied to any pointer advances the address by the size of the object to which the pointer points. (This rule always applies whether the pointer points to a structure or to an inbuilt class: in the character case, the size of the object is 1 byte, so incrementing the address is the desired operation.) All other arithmetic operations on pointers are similarly interpreted in units of the size of the object that the point points to. Thus, to access every third element of abc_array:
abc_pt = abc_array;
for(i=0;i<N/3;i++) {
  printf("%d %d %d\n",
    abc_pt->a, abc_pt->b, abc_pt->c );
  abc_pt = abc_pt + 3;
  }
As an address, abc_pt is incremented by 3*sizeof(struct abc) in every iteration of the loop.

memcpy again

The ANSI library specification for memcpy is:
void *memcpy( void *dest, void *src, int n );
The use of void * to match any pointer creates a problem for the compiler. Presented with this code:
void *memcpy( void *dest, void *src, int n ) {
  void *cp = dest;
  while( n-- ) {
    *cp++ = *src++;
    }
  return dest;
  }
most compilers will produce an error message like:
"Don't know the size of object pointed to"
(or something even more abstruse).
This means that the compiler doesn't know how to increment the addresses, cp and src, because it doesn't know what they point to. This can be fixed in a number of ways: type-casting is commonly used:
void *memcpy( void *dest, void *src, int n ) {
  char *cp = (char *)dest;
  while( n-- ) {
    *cp++ = *(char *)src++;
    }
  return dest;
  }
In this example, the void * pointers are cast to char * ones before being used - enabling the compiler to work out that the address should be incremented by one byte.

Good coding practice

It should be obvious to you that the possibilities for generating "neat" and compact codings are almost endless. Regrettably, a whole generation of C programmers has explored many of the possibilities and has already produced a good fraction of the total set of code variations that take advantage of pointer arithmetic. I say "regrettably" because many of these codings, although ingenious, would certainly not qualify as easily intelligible: it's possible to produce obfuscated expressions that can take hours to interpret!

The *pointer++ pattern which steps through a C array (of any type) is compact and found in so many existing C programs that suggesting that it be replaced by pointer[i] is, by now, unrealistic! Thus this is one pattern that any one maintaining C programs must understand. Consequently, using it yourself is reasonable: any competent C programmer will immediately understand what the pattern does.

More imaginative uses of pointers should, however, be considered with care! Occasionally, these uses could be justified by the argument that they enabled the compiler to produce much faster code. Such arguments are rarely valid now:

  1. modern optimising compilers are often good enough to reduce the longer, more easily read expressions to code that's as efficient (fast and compact) as the hand-optimised version and
  2. computers are generally so fast that cost-benefit analyses will almost always come down on the side of the more easily maintained code. (Even if it's slower than a hand-optimised version, the extra computer time has a cost that's negligible compared to the cost of the programmer's time to optimise code by hand!)
There are usually more explicit, more easily understood ways of producing the same result, making the "elegant" or "ingenious" uses of pointers pointlessly bad coding!


Continue on to More Advanced C features
Back to the Table of Contents

© John Morris, 1997