Objects First


Strings and Pointers

Again, we'll look at two apparently unrelated topics together. But as you'll see, looking at C strings leads immediately to the need to understand pointers.

Strings

We've already seen strings used in the printf function and know that a string is formed by placing double quotation marks around a sequence of characters. A string is quite different from a character, hence the use of single quotation marks around character constants.

What is a string?

Its representation or the way that it is stored in the computer's memory is the same as an array of characters with a null character at the end. Thus, assuming the compiler decides to store the string "abcd" from address 1000 onwards in the computer's memory, we would see:
Memory address 10001001100210031004
ContentCharacter abc d\0
Value
(decimal)
979899 1000
(\0 is the null character character, encoded by a zero in memory).

Syntactically, "abcd" is a pointer to the memory location in which the first character of the string is stored.

In conventional (or von Neuman machines), both the program and the data on which it operates are stored in addressable memory. Here, we aren't concerned with the program addresses (although later, when we look at functions as parameters, we shall be concerned with the address of the code of a function), but the data addresses (regrettably!) need to be considered in almost all C programs. Thus the value of the symbol "abcd" in your program is actually 1000 - the address of its first character!

Thus, if we write:

char *p;
p = "abcd";
printf("p = %d\n", p);
The declaration char * means that we are defining a pointer to a char rather than a char.
  1. The program compiles correctly because both p and "abcd" are char *.
  2. It will print out the address of the string "abcd" in memory.

Operations on strings

Creation of strings

Strings can also be considered as arrays of characters. Thus I can define strings as character arrays in two ways:
void some_function( .. ) {
  char s[5] = "abcd";
  char t[5] = { 'a', 'b', 'c', 'd', '\0' };

  }
These declarations will produce arrays, s and t, with identical contents, but at different addresses - there will be two strings, "abcd" stored in the program memory. We can use s and t as the names for the strings, because Thus, to print the two strings, we can write:
printf("s is [%s], t is [%s]\n", s, t );
which will produce:
s is [abcd], t is [abcd]
We can also write:
printf("Today is %s\n", "Monday" );
which will produce:
Today is Monday
s, t and "Monday" are all pointers to a character or char *.

Warning

Do not forget to allocate an extra byte (one char) for the null character terminating the string!
char s[4] = "abcd";
will not work, because the declaration provides space for only four characters. Always allow one extra character for the terminating null character! This is a major source or error for new C programmers. When strings are manipulated in programs, few C functions check the size of the area into which strings are being copied (the structure of C doesn't allow them to do it reliably anyway!) and thus there is considerable potential for error! We will see some examples of functions which can cause errors when insufficient space is allocated for the original array later in this section.

Assigning strings

C has no direct mechanism for copying strings, thus
char s[10];
s = "0123456789";
produces a compiler error and would have had an unexpected side-effect if the intent had been to copy "0123456789" into the array, s. C doesn't allow us to re-assign the name of an array to point to another area of memory. Remembering that then we can see that the assignment to s would have the effect of hiding the area originally allocated for s.

By not providing an in-built string data type, the designers of C were able to produce a somewhat smaller language (with less formal syntactic and semantic rules), but the penalty has been much grief in programs that don't run correctly because insufficient space was allocated for strings!

To copy strings, you must use the library function, strcpy, or write your own copying function:

#include <string.h>	/* Include the string function prototypes */
char s[N] = "test";
char t[N];
strcpy( t, s );
(You can view the file string.h as defining a class of strings - because it defines a set of operations which will work on C-style null-terminated strings.

The formal specification for strcpy (and one of its relatives) found in string.h is:

char *strcpy(char *dest, const char *src);

char *strncpy(char *dest, const char *src, size_t n);
strcpy copies characters from the source ( src) to the destination, dest until it finds a null character in src. (The null character is copied also, so that the destination string is also a null-terminated string.)

A significant problem with strcpy is that it has no way of knowing that enough space was allowed in dest to receive all the characters in src. It simply keeps copying until it finds the terminating null character. If the source has no null character to terminate it, then strcpy will continue to copy characters until, by luck (rarely by design!), it finds a null character somewhere in memory. Even if the source has a null character, there is the possibility that the destination doesn't have enough space. To make programs a little more robust, the standard library provides strncpy which will copy at most n characters are transferred. This enables you to avoid over-writing memory, by setting n to be the size of the destination. However strncpy will not copy the terminating null character if the source string has n or more characters, so the destination copy could be left without its terminating null character.

Caution is required in all string manipulation code written in C!

Comparing strings

Comparing two strings for equality is a very common operation - it's the key to looking up records in data-bases, for example. The following will not work in C:
char s[N], t[N];
.... /* Copy strings into s and t */
if( s == t ) { .. }
else { ... };
In this fragment, the else branch will always be taken, because s and t are memory addresses, not the strings themselves!

You must use a function:

#include <string.h>	/* Include the string function prototypes */
..
char s[N], t[N];
.... /* Copy strings into s and t */
if( strcmp( s, t ) == 0 ) { .. }
else { ... };
strcmp is also defined in string.h. The return value is:
< 0s is lexicographically less than t
0s is equal to t
> 0s is lexicographically greater than t
s is lexicographically less than t means that s would precede t in a dictionary. The ordering of special characters is determined by their ASCII encodings, which also imply that upper case letters precede lower case ones, so that:

Function CallReturns
strcmp("abc","bcd")< 0
strcmp("xyz","bcd")> 0
strcmp("xyz","xyz") 0
strcmp("XYZ","xyz")< 0
strcmp("123","1234")< 0

strcmp has the same problems as strcpy if the strings aren't null-terminated - it continues through memory comparing characters until it finds a null character. strncmp is also defined; it compares at most n characters. strncmp is "safe" as long as a sensible value of n is supplied; it simply stops comparing after the nth character and returns a 0. If the strings differ in the first n characters, it will, of course, return as soon as it finds the difference.

Key terms

von Neuman machine
Almost all today's computers belong to the class of machines known as von Neuman machines. They are also referred to as "stored-program machines" because the program is stored in the same memory as the data and the machine fetches instructions from successive locations in the computer's memory.
pointer
Term used in C and some other programming languages to refer to a variable which contains a memory address of a data item rather than the value of the item itself.
lexicographical ordering
The order used by a dictionary (or lexicon). In computer systems, special characters (such as ; : * % $ and #) are usually ordered by their values in the ASCII encoding scheme.

Continue on to Pointers
Back to the Table of Contents
© John Morris, 1998