Objects First


Structures

C allows us to aggregate primitive objects into more complex ones. You have already seen the struct directive used to group all the attributes of a class into a single structure.

The formal syntax used to declare a structure is:

struct structure_name {
  declarations
  };

Here are some examples:

Class definitionClass implementation
typedef struct position *Point;
struct point {
  double x, y;
  };
typedef struct circle *Circle;
#include "Position.h"

struct circle {
  double radius;
  Point origin;
  };
typedef struct ellipse *Ellipse;
#include "Point.h"

struct ellipse {
  double major, minor;
  Point origin;
  };
typedef struct ellipse_list *EllipseList;
#include "Ellipse.h"

#define MAX_ELLIPSES 100

struct ellipse_list {
  int n;
  Ellipse list[MAX_ELLIPSES];
  };

As you can see, the declarations part of a structure can contain basic types (inbuilt classes), other structures and arrays.

Elements of structures

To confuse beginners, C has two ways of referring to elements of structures. If I define a variable to be a struct, then I reference its elements with the '.' operator:
struct circle a, b;
...
area = PI*a.radius*a.radius;
However, if I've defined a pointer to a struct (as we have done whenever we defined a class), then I reference its elements with the '->' operator:
Circle a, b;
...
area = PI*a->radius*a->radius;
This is a minor annoyance. Luckily, the compiler is invariably able to detect your errors! This of course leads to the question: "If the compiler can always tell which is the correct one, why use different operators?" The answer probably lies in the dark history of C - before ANSI C and type-checking compilers!

sizeof

The sizeof function is actually 'executed' by the compiler, so that its range of permissible arguments is somewhat wider than a normal function. It can operate on: Thus, all the following are valid:
int a, b, c[N];
struct t { double x, double y };
typedef struct t T;
..
n = sizeof( a );
m = sizeof( &a );
o = sizeof( int );
p = sizeof( int * );
q = sizeof( struct t );
r = sizeof( T );
s = sizeof( c[0] );  /* Returns sizeof(int) */
s = sizeof( c );     /* Returns N*sizeof(int) */
Note the inconsistent treatment of arrays: everywhere else in C, the name of an array is formally equivalent to a pointer to its first element. However, sizeof returns the size of the whole array. Despite the inconsistency, this provides an invaluable technique for writing easily maintained table-driven programs. You can use the [] structure with an initialisation and let the compiler count the elements for you:
int test_values[] = { 1,2,3,5,7,11,13,17,19 };
If your program needs to know how many test values are in the array, you can write:
#define N_TESTS	(sizeof(test_values)/sizeof(test_values[0]))
The compiler will count (very precisely and accurately - something human programmers often have trouble doing, especially at 3 am when finishing a program for a 9 am demonstration!) the number of test values and set N_TESTS appropriately.

Alignment

Some machines work more efficiently if values (such as floats and doubles) which require more than a single byte of storage are aligned on 4- or 8-byte boundaries in the system. This is related to the efficient use of the wide busses - a 128-bit (16-byte) datapath to memory is common in high performance systems now - and caches. Values which cross boundaries require two fetches and take twice as long. Good compilers understand this and will pad a struct appropriately. Thus:
struct padded {
  char a;
  double e;
  int x, y;
  struct f g;
  };
printf("%d\n", sizeof( struct padded ) );
might produce 5 more than expected.


The unaligned packing of data into memory would result in slow access to e, x, y and g because they cross word boundaries. So the compiler will insert 3 bytes in the middle of the struct and 2 bytes at the end.

Thus in portable code, it is essential that you use sizeof rather than a constant for the space requirements of structs.

Structures in binary files

The most efficient way to store data in a file is to use the binary I/O routines, read and write. Note that these require the number of bytes to be written as their third argument.
Robust programmer's rule
Always use sizeof in the 3rd argument to read and write for anything other than character data. The reasons have already been alluded to, but I'll summarise them here again:
  1. Different C implementations may use different amounts of space for any data type. Assuming an int requires 4 bytes is bound to cause grief! Similarly, but with lower probability, assuming float requires 4 and double requires 8 bytes may cause problems.
  2. structs may be padded with nulls (or some other junk) for efficiency.
  3. Using symbolic expressions such as N*sizeof(double) for an array declared as double x[N] produces a much more easily maintained program. Some examples of good practice:
    #define N   ..
    int some_function( int infile, int outfile ) {
      int x, n;
      double y[N];
      struct large_structure {
        char label;
        double z[N-2];
        int p, q, r;
        } a;
      struct large_structure b[N-2];
    
      n = read( infile, &x, sizeof(int) );
      n = read( infile, y, N*sizeof(double) );
      n = write( outfile, &a, sizeof(struct large_structure) );
      n = write( outfile, b, (N-2)*sizeof(struct large_structure) );
    
    Remember that checking for I/O errors is an important characteristic of a robust program. In a full program, each I/O statement would look this this:
      n = write( outfile, b, (N-2)*sizeof(struct large_structure) );
      if( n != ((N-2)*sizeof(struct large_structure)) ) {
      	perror("some_function: write b ");
      	}
    

File portability

The differing space requirements of structures on different machines means that porting binary files between machines may have problems other than the well-known big- or little-endian problem. There is no easy solution to these problems: you have to find out whether your compiler has padded structures and where the nulls have been inserted and write a small program for the target machine which make allowances for the padding. As an example, suppose the source machine places doubles on 8-byte boundaries and the target machine doesn't. You have written an array of structures into a file:
/* labelled_point.h */
typedef struct t_labelled_point *labelled_point;

/* labelled_point.c */
struct t_labelled_point {
                char a;
                double z; };

#include "labelled_point.h"

.. /* Constructors, etc */

int dump_point_array( labelled_point a[], int n ) {
  int f, i, nb;
  f = open( "dump.dat", 1 );
  for(i=0;i<n;i++)
    nb = write( f, a[i], sizeof(struct t_labelled_point) );
  ...
  return .. ; /* Some suitable error code */
  }
Note
Be careful not to use sizeof(labelled_point). labelled_point is formally a pointer, so the sizeof operator will just return the space needed by a pointer.

By printing out sizeof(struct t_labelled_point), you discovered that your machines pads the structure to 16 bytes. By examining the binary file, you've confirmed that there are 7 null bytes after each character. However, the same exercise on the target machine shows that the struct isn't padded, ie it only takes 9 bytes, so we need to do some hacking to read the file. So on the target machine, the class has a load_point_array method:

/* labelled_point.c */
struct t_labelled_point {
                char a;
                double z; };

#include "labelled_point.h"

#define BUF_LEN  16
#define Z_OFFSET  8

int load_point_array( labelled_point a[], int n ) {
  /* a is an array of objects constructed by calls to the
     appropriate constructor */
  int f, i, nb;
  char buf[BUF_LEN];
  f = open( "dump.dat", 0 );
  for(i=0;i<n;i++) {
    /* Read data into a character buffer */
    nb = read( f, buf, BUF_LEN );
    a[i]->a = buf[0];  /* First char is the label */
    memcpy( &(a[i]->z), &buf[Z_OFFSET], sizeof(double) );
    }
  ..
  return ..
  }
In load_point_array, we read each structure into a char buffer. Then we use memcpy, which just copies bytes of memory from one place to another, to copy the 8 bytes of the double from bytes 8-15 of the buffer read from the file to the correct place.

Alternatively, we could have used the same strategy in reverse to pack the structure down into 9 bytes on the source machine before writing it to a file.

Since file input and output is usually much slower than the CPU, one might well do this routinely to increase the performance of the machine - even if the file is only used on the original machine. However note carefully that unless you're dealing with a file containing many thousands of records any saving will be so small that it's not worth the extra coding time! You should also not forget the hidden cost: if you make the program more complex, you increase the chance of an error creeping in and thus the cost of debugging. Thus before starting out on such a time-saving exercise, make sure the return justifies the effort!

Key terms

bus
The interface between a processor and its memory and input/output devices is generally called a bus - because many devices (processor, I/O and memory) share the same set of connections.
cache
A small, fast memory placed between the central processor of a computer and the larger, but much slower, main memory. A cache will generally have a dramatic effect on the performance of a computer by reducing the effective time required to access memory.
pad
Operation performed by a compiler to align elements of structures so that they are efficiently accessed on a particular architecture. It does this by "padding" out a structure with unused bytes so that the next element of the structure starts on a word (or other boundary which allows efficient access) boundary.
{big,little}-endian
Big- or little-endian refers to different conventions used by different computer manufacturers for ordering bytes within a word in the computer's memory. In a little-endian machine, the least- significant byte of a 2-, 4- or 8-byte word is first (at the lowest memory address), whereas in a big-endian machine, the most-significant byte is first.

Continue on to Assertions
Back to the Table of Contents
©
John Morris, 1998