C++ is a productivity enhancement
tool. Why else
would you make the effort
(and it is an effort
regardless of how
easy we attempt to make the transition)
to switch from some language that you
already know and are productive with to a new language in which you’re
going to be less productive for a while
until you get the hang of it?
It’s because you’ve become convinced that you’re going to get
big gains by using this new tool.
Productivity
in computer programming
terms
means that fewer people can make much more complex and impressive
programs in less time. There are certainly other issues when it comes to
choosing a language
such as efficiency (does the nature of the language cause
slowdown and code bloat?)
safety (does the language help you ensure that your
program will always do what you plan
and handle errors gracefully?)
and
maintenance (does the language help you create code that is easy to understand
modify
and extend?). These are certainly important factors that will be
examined in this book.
But raw productivity means a program that
formerly took three of you a week to write now takes one of you a day or two.
This touches several levels of economics. You’re happy because you get the
rush of power that comes from building something
your client (or boss) is happy
because products are produced faster and with fewer people
and the customers
are happy because they get products more cheaply. The only way to get massive
increases in productivity is to leverage off other people’s code. That is
to use libraries.
A library is
simply a bunch of code that someone else has written and packaged together.
Often
the most minimal package is a file with an extension like lib and
one or more header files to tell your compiler what’s in the library. The
linker knows how to search through the library file and extract the appropriate
compiled code. But that’s only one way to deliver a library. On platforms
that span many architectures
such as Linux/Unix
often the only sensible way to
deliver a library is with source code
so it can be reconfigured and recompiled
on the new target.
Thus
libraries are probably the most
important way to improve productivity
and one of the primary design goals of
C++ is to make library use easier. This implies that there’s something
hard about using libraries in C. Understanding this factor will give you a first
insight into the design of C++
and thus insight into how to use
it.
A library usually starts out as a
collection of functions
but if you have used third-party
C libraries you know there’s
usually more to it than that because there’s more to life than
behavior
actions
and functions. There are also
characteristics (blue
pounds
texture
luminance)
which
are represented by data. And when you start to deal with a set of
characteristics in C
it is very convenient to clump them together into a
struct
especially if you want to represent more
than one similar thing in your problem space. Then you can make a variable of
this struct for each thing.
Thus
most C libraries have a set of
structs and a set of functions that act on those structs. As an
example of what such a system looks like
consider a programming tool that acts
like an array
but whose size can be established at runtime
when it is created.
I’ll call it a CStash. Although it’s written in C++
it has
the style of what you’d write in C:
//: C04:CLib.h
// Header file for a C-like library
// An array-like entity created at runtime
typedef struct CStashTag {
int size; // Size of each space
int quantity; // Number of storage spaces
int next; // Next empty space
// Dynamically allocated array of bytes:
unsigned char* storage;
} CStash;
void initialize(CStash* s
int size);
void cleanup(CStash* s);
int add(CStash* s
const void* element);
void* fetch(CStash* s
int index);
int count(CStash* s);
void inflate(CStash* s
int increase);
///:~
A tag name like CStashTag is
generally used for a struct in case you need to
reference the struct inside itself. For example
when creating a
linked list (each element in your list contains a pointer to the next
element)
you need a pointer to the next struct variable
so you need a
way to identify the type of that pointer within the struct body. Also
you'll almost universally see the typedef as shown
above for every struct in a C library. This is done so you can treat the
struct as if it were a new type and define variables of that
struct like this:
CStash A B C;
The storage pointer is an
unsigned char*. An unsigned char is the smallest piece of storage
a C compiler supports
although on
some machines it can be the same size as the largest. It’s implementation
dependent
but is often one byte long. You might think that because the
CStash is designed to hold any type of variable
a
void* would be more
appropriate here. However
the purpose is not to treat this storage as a block
of some unknown type
but rather as a block of contiguous
bytes.
The source code for the implementation
file (which you may not get if you buy a library commercially – you might
get only a compiled obj or lib or dll
etc.) looks like
this:
//: C04:CLib.cpp {O}
// Implementation of example C-like library
// Declare structure and functions:
#include "CLib.h"
#include <iostream>
#include <cassert>
using namespace std;
// Quantity of elements to add
// when increasing storage:
const int increment = 100;
void initialize(CStash* s
int sz) {
s->size = sz;
s->quantity = 0;
s->storage = 0;
s->next = 0;
}
int add(CStash* s
const void* element) {
if(s->next >= s->quantity) //Enough space left?
inflate(s
increment);
// Copy element into storage
// starting at next empty space:
int startBytes = s->next * s->size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < s->size; i++)
s->storage[startBytes + i] = e[i];
s->next++;
return(s->next - 1); // Index number
}
void* fetch(CStash* s
int index) {
// Check index boundaries:
assert(0 <= index);
if(index >= s->next)
return 0; // To indicate the end
// Produce pointer to desired element:
return &(s->storage[index * s->size]);
}
int count(CStash* s) {
return s->next; // Elements in CStash
}
void inflate(CStash* s
int increase) {
assert(increase > 0);
int newQuantity = s->quantity + increase;
int newBytes = newQuantity * s->size;
int oldBytes = s->quantity * s->size;
unsigned char* b = new unsigned char[newBytes];
for(int i = 0; i < oldBytes; i++)
b[i] = s->storage[i]; // Copy old to new
delete [](s->storage); // Old storage
s->storage = b; // Point to new memory
s->quantity = newQuantity;
}
void cleanup(CStash* s) {
if(s->storage != 0) {
cout << "freeing storage" << endl;
delete []s->storage;
}
} ///:~
initialize( ) performs the
necessary setup for struct CStash by setting the internal variables to
appropriate values. Initially
the storage pointer is set to zero –
no initial storage is allocated.
The add( ) function inserts
an element into the CStash at the next available location. First
it
checks to see if there is any available space left. If not
it expands the
storage using the inflate( ) function
described
later.
Because the compiler doesn’t know
the specific type of the variable being stored (all the function gets is a
void*)
you can’t just do an assignment
which would certainly be
the convenient thing. Instead
you must copy the variable byte-by-byte. The most
straightforward way to perform the copying is with array indexing. Typically
there are already data bytes in storage
and this is indicated by the
value of next. To start with the right byte offset
next is
multiplied by the size of each element (in bytes) to produce startBytes.
Then the argument element is cast to an unsigned char* so that it
can be addressed byte-by-byte and copied into the available storage
space. next is incremented so that it indicates the next available piece
of storage
and the “index number” where the value was stored so
that value can be retrieved using this index number with
fetch( ).
fetch( ) checks to see that
the index isn’t out of bounds and then returns the address of the desired
variable
calculated using the index argument. Since index
indicates the number of elements to offset into the CStash
it must
be multiplied by the number of bytes occupied by each piece to produce the
numerical offset in bytes. When this offset is used to index into storage
using array indexing
you don’t get the address
but instead the byte
at the address. To produce the address
you must use the address-of operator
&.
count( ) may look a bit
strange at first to a seasoned C programmer. It seems like a lot of trouble to
go through to do something that would probably be a lot easier to do by hand. If
you have a struct CStash called intStash
for example
it would
seem much more straightforward to find out how many elements it has by saying
intStash.next instead of making a function call (which has overhead)
such as count(&intStash). However
if you wanted to change the
internal representation of CStash and thus the way the count was
calculated
the function call interface allows the necessary flexibility. But
alas
most programmers won’t bother to find out about your
“better” design for the library. They’ll look at the
struct and grab the next value directly
and possibly even change
next without your permission. If only there were some way for the library
designer to have better control over things like this! (Yes
that’s
foreshadowing.)
You never know the maximum amount of
storage you might need for a CStash
so the memory pointed to by
storage is allocated from the heap. The
heap is a big block of memory used for allocating smaller
pieces at runtime. You use the heap when you don’t know the size of the
memory you’ll need while you’re writing a program. That is
only at
runtime will you find out that you need space to hold 200 Airplane
variables instead of 20. In Standard C
dynamic-memory allocation functions
include malloc( )
calloc( )
realloc( )
and
free( ). Instead of library calls
however
C++ has a more sophisticated (albeit simpler to use) approach to dynamic memory
that is integrated into the language via the keywords
new and
delete.
The inflate( ) function uses
new to get a bigger chunk of space for the CStash. In this
situation
we will only expand memory and not shrink it
and the
assert( ) will guarantee that a negative
number is not passed to inflate( ) as the increase value. The
new number of elements that can be held (after inflate( ) completes)
is calculated as newQuantity
and this is multiplied by the number of
bytes per element to produce newBytes
which will be the number of bytes
in the allocation. So that we know how many bytes to copy over from the old
location
oldBytes is calculated using the old
quantity.
The actual storage allocation occurs in
the new-expression
which is the expression
involving the new keyword:
new unsigned char[newBytes];
The general form of the new-expression
is:
new Type;
in which Type describes the type
of variable you want allocated on the heap. In this case
we want an array of
unsigned char that is newBytes long
so that is what appears as
the Type. You can also allocate something as simple as an int by
saying:
new int;
and although this is rarely done
you can
see that the form is consistent.
A new-expression returns a pointer
to an object of the exact type that you asked for. So if you say new
Type
you get back a pointer to a Type. If you say new
int
you get back a pointer to an int. If you want a new
unsigned char array
you get back a pointer to the first element of that
array. The compiler will ensure that you assign the return value of the
new-expression to a pointer of the correct type.
Of course
any time you request memory
it’s possible for the request to fail
if there is no more memory. As you
will learn
C++ has mechanisms that come into play if the memory-allocation
operation is unsuccessful.
Once the new storage is allocated
the
data in the old storage must be copied to the new storage; this is again
accomplished with array indexing
copying one byte at a time in a loop. After
the data is copied
the old storage must be released so that it can be used by
other parts of the program if they need new storage. The delete keyword
is the complement of new
and must be applied to release any storage that
is allocated with new (if you forget to use delete
that storage
remains unavailable
and if this so-called
memory leak happens enough
you’ll run out of memory). In addition
there’s a special syntax
when you’re deleting an array. It’s as if you must remind the
compiler that this pointer is not just pointing to one object
but to an array
of objects: you put a set of empty square brackets in front of the pointer to be
deleted:
delete []myArray;
Once the old storage has been deleted
the pointer to the new storage can be assigned to the storage pointer
the quantity is adjusted
and inflate( ) has completed its
job.
Note that the heap manager is fairly
primitive. It gives you chunks of memory and takes them back when you
delete them. There’s no inherent facility for
heap compaction
which compresses the heap to
provide bigger free chunks. If a program allocates and frees heap storage for a
while
you can end up with a
fragmented heap that has
lots of memory free
but without any pieces that are big enough to allocate the
size you’re looking for at the moment. A heap
compactor complicates a program because it moves memory chunks around
so your
pointers won’t retain their proper values. Some operating environments
have heap compaction built in
but they require you to use special memory
handles (which can be temporarily converted to pointers
after locking
the memory so the heap compactor can’t move it) instead of pointers. You
can also build your own heap-compaction scheme
but this is not a task to be
undertaken lightly.
When you create a
variable on the stack at
compile-time
the storage for that variable is automatically created and freed
by the compiler. The compiler knows exactly how much storage is needed
and it
knows the lifetime of the variables because of scoping. With dynamic memory
allocation
however
the compiler doesn’t know how much storage
you’re going to need
and it doesn’t know the lifetime of
that storage. That is
the storage doesn’t get cleaned up automatically.
Therefore
you’re responsible for releasing the storage using
delete
which tells the heap manager that storage can be used by the next
call to new. The logical place for this to happen in the library is in
the cleanup( ) function because that is where all the closing-up
housekeeping is done.
To test the library
two CStashes
are created. The first holds ints and the second holds arrays of 80
chars:
//: C04:CLibTest.cpp
//{L} CLib
// Test the C-like library
#include "CLib.h"
#include <fstream>
#include <iostream>
#include <string>
#include <cassert>
using namespace std;
int main() {
// Define variables at the beginning
// of the block
as in C:
CStash intStash
stringStash;
int i;
char* cp;
ifstream in;
string line;
const int bufsize = 80;
// Now remember to initialize the variables:
initialize(&intStash
sizeof(int));
for(i = 0; i < 100; i++)
add(&intStash
&i);
for(i = 0; i < count(&intStash); i++)
cout << "fetch(&intStash
<< i <<
) = "
<< *(int*)fetch(&intStash
i)
<< endl;
// Holds 80-character strings:
initialize(&stringStash
sizeof(char)*bufsize);
in.open("CLibTest.cpp");
assert(in);
while(getline(in
line))
add(&stringStash
line.c_str());
i = 0;
while((cp = (char*)fetch(&stringStash
i++))!=0)
cout << "fetch(&stringStash
<< i <<
) = "
<< cp << endl;
cleanup(&intStash);
cleanup(&stringStash);
} ///:~
Following the form required by C
all the
variables are created at the beginning of the scope of main( ). Of
course
you must remember to initialize the CStash variables later in the
block by calling initialize( ). One of the problems with C libraries
is that you must carefully convey to the user the importance of the
initialization and cleanup
functions. If these functions aren’t called
there
will be a lot of trouble. Unfortunately
the user doesn’t always wonder if
initialization and cleanup are mandatory. They know what they want to
accomplish
and they’re not as concerned about you jumping up and down
saying
“Hey
wait
you have to do this first!” Some users
have even been known to initialize the elements of a structure themselves.
There’s certainly no mechanism in C to prevent it (more
foreshadowing).
The intStash is filled up with
integers
and the stringStash is filled with character arrays. These
character arrays are produced by opening the source code file
CLibTest.cpp
and reading the lines from it into a
string called line
and then producing a
pointer to the character representation of line using the member function
c_str( ).
After each Stash is loaded
it is
displayed. The intStash is printed using a for loop
which uses
count( ) to establish its limit. The stringStash is printed
with a while
which breaks out when fetch( ) returns zero to
indicate it is out of bounds.
You’ll also notice an additional
cast in
cp = (char*)fetch(&stringStash i++)
This is due to the
stricter type checking in C++
which does not allow you to simply assign a void* to any other type (C
allows this).
There is one more important issue you
should understand before we look at the general problems in creating a C
library. Note that the CLib.h header file must be included in any
file that refers to CStash because the compiler can’t even guess at
what that structure looks like. However
it can guess at what a function
looks like; this sounds like a feature but it turns out to be a major
C pitfall.
Although you should always declare
functions by including a header file
function declarations
aren’t essential in C. It’s possible in C (but not in C++) to
call a function that you haven’t declared. A good compiler will warn you
that you probably ought to declare a function first
but it isn’t enforced
by the C language standard. This is a dangerous practice
because the C compiler
can assume that a function that you call with an int argument has an
argument list containing int
even if it may actually contain a
float. This can produce bugs that are very difficult to find
as
you will see.
Each separate C implementation file (with
an extension of .c) is a
translation unit. That
is
the compiler is run separately on each translation unit
and when it is
running it is aware of only that unit. Thus
any information you provide by
including header files is quite important because it determines the
compiler’s understanding of the rest of your program. Declarations in
header files are particularly important
because everywhere the header is
included
the compiler will know exactly what to do. If
for example
you have a
declaration in a header file that says void func(float)
the compiler
knows that if you call that function with an integer argument
it should
convert
the int to a float as it passes the argument (this is called
promotion). Without the declaration
the C compiler would simply assume
that a function func(int) existed
it wouldn’t do the promotion
and the wrong data would quietly be passed into
func( ).
For each translation unit
the compiler
creates an object file
with an extension of .o
or .obj or something similar. These object files
along with the
necessary start-up code
must be collected by the linker
into the executable program. During linking
all the
external references must be resolved. For example
in CLibTest.cpp
functions such as initialize( ) and fetch( ) are
declared (that is
the compiler is told what they look like) and used
but not
defined. They are defined elsewhere
in CLib.cpp. Thus
the calls in
CLib.cpp are external references. The linker must
when it puts all the
object files together
take the unresolved external references
and find the addresses they
actually refer to. Those addresses are put into the executable program to
replace the external references.
It’s important to realize that in
C
the external references that the linker searches for are simply function
names
generally with an underscore in front of them. So all the linker has to
do is match up the function name where it is called and the function body in the
object file
and it’s done. If you accidentally made a call that the
compiler interpreted as func(int) and there’s a function body for
func(float) in some other object file
the linker will see _func
in one place and _func in another
and it will think everything’s
OK. The func( ) at the calling location will push an int onto
the stack
and the func( ) function body will expect a float
to be on the stack. If the function only reads the value and doesn’t write
to it
it won’t blow up the stack. In fact
the float value it
reads off the stack might even make some kind of sense. That’s worse
because it’s harder to find the
bug.
We are remarkably adaptable
even in
situations in which perhaps we shouldn’t adapt. The style of the
CStash library has been a staple for C programmers
but if you look at it
for a while
you might notice that it’s rather . . . awkward. When you use
it
you have to pass the address of the structure to every single function in
the library. When reading the code
the mechanism of the library gets mixed with
the meaning of the function calls
which is confusing when you’re trying
to understand what’s going on.
One of the biggest obstacles
however
to
using libraries in C is the problem of name
clashes. C has a single
name space for functions; that is
when the linker looks
for a function name
it looks in a single master list. In addition
when the
compiler is working on a translation unit
it can work only with a single
function with a given name.
Now suppose you decide to buy two
libraries from two different vendors
and each library has a structure that must
be initialized and cleaned up. Both vendors decided that
initialize( ) and cleanup( ) are good names. If you
include both their header files in a single translation unit
what does the C
compiler do? Fortunately
C gives you an error
telling you there’s a type
mismatch in the two different argument lists of the declared functions. But even
if you don’t include them in the same translation unit
the linker will
still have problems. A good linker will detect that there’s a name clash
but some linkers take the first function name they find
by searching through
the list of object files in the order you give them in the link list. (This can
even be thought of as a feature because it allows you to replace a library
function with your own version.)
In either event
you can’t use two
C libraries that contain a function with the identical name. To solve this
problem
C library vendors will often prepend a sequence of unique characters to
the beginning of all their function names. So initialize( ) and
cleanup( ) might become CStash_initialize( ) and
CStash_cleanup( ). This is a logical thing to do because it
“decorates” the name of the struct the function works on with
the name of the
function.
Now it’s time to take the first
step toward creating classes in C++. Variable names inside a
struct do not clash with
global variable names. So why not take advantage of this for function names
when those functions operate on a particular struct? That is
why not
make functions members of
structs?
Step one is exactly that. C++ functions
can be placed inside structs as
“member functions.”
Here’s what it looks like after
converting
the C version of CStash to the C++ Stash:
//: C04:CppLib.h
// C-like library converted to C++
struct Stash {
int size; // Size of each space
int quantity; // Number of storage spaces
int next; // Next empty space
// Dynamically allocated array of bytes:
unsigned char* storage;
// Functions!
void initialize(int size);
void cleanup();
int add(const void* element);
void* fetch(int index);
int count();
void inflate(int increase);
}; ///:~
First
notice there is no
typedef. Instead of
requiring you to create a typedef
the C++ compiler turns the name of the
structure into a new type name for the program (just as int
char
float and double are type names).
All the data members are exactly the same
as before
but now the functions are inside the body of the struct. In
addition
notice that the first argument from the C version of the library has
been removed. In C++
instead of forcing you to pass the
address of the structure as the first argument to all the functions that operate
on that structure
the compiler secretly does this for you. Now the only
arguments for the functions are concerned with what the function does
not the mechanism of the function’s operation.
It’s important to realize that the
function code is effectively the same as it was with the C version of the
library. The number of arguments is the same (even though you don’t see
the structure address being passed in
it’s still there)
and
there’s only one function body for each function. That is
just because
you say
Stash A B C;
doesn’t mean you get a different
add( ) function for each variable.
So the code that’s generated is
almost identical to what you would have written for the C version of the
library. Interestingly enough
this includes the
“name decoration” you probably would have done to produce
Stash_initialize( )
Stash_cleanup( )
and so on. When
the function name is inside the struct
the compiler effectively does the
same thing. Therefore
initialize( ) inside the structure
Stash will not collide with a function named initialize( )
inside any other structure
or even a global function named
initialize( ). Most of the time you don’t have to worry about
the function name decoration – you use the undecorated name. But sometimes
you do need to be able to specify that this initialize( ) belongs to
the struct Stash
and not to any other struct. In
particular
when you’re defining the function you need to fully specify
which one it is. To accomplish this full specification
C++ has an operator
(::) called the
scope
resolution operator (named so because names can now be in different scopes:
at global scope or within the scope of a struct). For example
if you
want to specify initialize( )
which belongs to Stash
you
say Stash::initialize(int size). You can see how the scope resolution
operator is used in the function definitions:
//: C04:CppLib.cpp {O}
// C library converted to C++
// Declare structure and functions:
#include "CppLib.h"
#include <iostream>
#include <cassert>
using namespace std;
// Quantity of elements to add
// when increasing storage:
const int increment = 100;
void Stash::initialize(int sz) {
size = sz;
quantity = 0;
storage = 0;
next = 0;
}
int Stash::add(const void* element) {
if(next >= quantity) // Enough space left?
inflate(increment);
// Copy element into storage
// starting at next empty space:
int startBytes = next * size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < size; i++)
storage[startBytes + i] = e[i];
next++;
return(next - 1); // Index number
}
void* Stash::fetch(int index) {
// Check index boundaries:
assert(0 <= index);
if(index >= next)
return 0; // To indicate the end
// Produce pointer to desired element:
return &(storage[index * size]);
}
int Stash::count() {
return next; // Number of elements in CStash
}
void Stash::inflate(int increase) {
assert(increase > 0);
int newQuantity = quantity + increase;
int newBytes = newQuantity * size;
int oldBytes = quantity * size;
unsigned char* b = new unsigned char[newBytes];
for(int i = 0; i < oldBytes; i++)
b[i] = storage[i]; // Copy old to new
delete []storage; // Old storage
storage = b; // Point to new memory
quantity = newQuantity;
}
void Stash::cleanup() {
if(storage != 0) {
cout << "freeing storage" << endl;
delete []storage;
}
} ///:~
There are several other things that are
different between C and C++. First
the declarations in the header
files are required by the
compiler. In C++ you cannot call a function without declaring it first. The
compiler will issue an error message otherwise. This is an important way to
ensure that function calls are consistent between the point where they are
called and the point where they are defined. By forcing you to
declare the function before you
call it
the C++ compiler virtually ensures that you will perform this
declaration by including the header file. If you also include the same header
file in the place where the functions are defined
then the compiler checks to
make sure that the declaration in the header and the function definition match
up. This means that the header file becomes a validated repository for function
declarations and ensures that functions are used consistently throughout all
translation units in the project.
Of course
global functions
can still be declared by hand
every place where they are defined and used. (This is so tedious that it becomes
very unlikely.) However
structures must always be declared before they are
defined or used
and the most convenient place to put a
structure
definition is in a header file
except for those you intentionally hide in a
file.
You can see that all the member functions
look almost the same as when they were C functions
except for the scope
resolution and the fact that the first argument from the C version of the
library is no longer explicit. It’s still there
of course
because the
function has to be able to work on a particular struct variable. But
notice
inside the member function
that the member selection is also gone!
Thus
instead of saying s–>size = sz; you say size = sz;
and eliminate the tedious s–>
which didn’t really add
anything to the meaning of what you were doing anyway. The C++ compiler is
apparently doing this for you. Indeed
it is taking the “secret”
first argument (the address of the structure that we were previously passing in
by hand) and applying the member selector whenever you refer to one of the data
members of a struct.
This
means that whenever you are inside the member function of another struct
you can refer to any member (including another member function) by simply giving
its name. The compiler will search through the local structure’s names
before looking for a global version of that name. You’ll find that this
feature means that not only is your code easier to write
it’s a lot
easier to read.
But what if
for some reason
you
want to be able to get your hands on the address of the structure? In the
C version of the library it was easy because each function’s first
argument was a CStash* called s. In C++
things are even more
consistent. There’s a special keyword
called
this
which produces the
address of the struct. It’s the equivalent of the
‘s’ in the C version of the library. So we can revert to the
C style of things by saying
this->size = Size;
The code generated by the compiler is
exactly the same
so you don’t need to use this in such a fashion;
occasionally
you’ll see code where people explicitly use this->
everywhere but it doesn’t add anything to the meaning of the code and
often indicates an inexperienced programmer. Usually
you don’t use
this often
but when you need it
it’s there (some of the examples
later in the book will use this).
There’s one last item to mention.
In C
you could assign a void* to any other pointer like
this:
int i = 10; void* vp = &i; // OK in both C and C++ int* ip = vp; // Only acceptable in C
and
there was no complaint from the compiler. But in C++
this statement is not
allowed. Why? Because C is not so particular about type information
so it
allows you to assign a pointer with an unspecified type to a pointer with a
specified type. Not so with C++. Type is critical in C++
and the compiler
stamps its foot when there are any violations of type information. This has
always been important
but it is especially important in C++ because you have
member functions in structs. If you could pass pointers to structs
around with impunity in C++
then you could end up calling a member function for
a struct that doesn’t even logically exist for that struct!
A real recipe for disaster. Therefore
while C++ allows the assignment of any
type of pointer to a void* (this was the original
intent of void*
which is required to be large enough to hold a pointer
to any type)
it will not allow you to assign a void pointer to
any other type of pointer. A cast is always required to tell the reader and the
compiler that you really do want to treat it as the destination type.
This brings up an interesting issue. One
of the important goals for C++ is to compile as much existing C code as possible
to allow for an easy transition to the new language. However
this doesn’t
mean any code that C allows will automatically be allowed in C++.
There
are a number of things the C compiler lets you get away with that are dangerous
and error-prone. (We’ll look at them as the book progresses.) The C++
compiler generates warnings and errors for these situations. This is often much
more of an advantage than a hindrance. In fact
there are many situations in
which you are trying to run down an error in C and just can’t find it
but
as soon as you recompile the program in C++
the
compiler points out the problem! In C
you’ll often find that you can get
the program to compile
but then you have to get it to work. In C++
when the
program compiles correctly
it often works
too! This is because the language is
a lot stricter about type.
You can see a number of new things in the
way the C++ version of Stash is used in the following test
program:
//: C04:CppLibTest.cpp
//{L} CppLib
// Test of C++ library
#include "CppLib.h"
#include "../require.h"
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main() {
Stash intStash;
intStash.initialize(sizeof(int));
for(int i = 0; i < 100; i++)
intStash.add(&i);
for(int j = 0; j < intStash.count(); j++)
cout << "intStash.fetch(" << j << ") = "
<< *(int*)intStash.fetch(j)
<< endl;
// Holds 80-character strings:
Stash stringStash;
const int bufsize = 80;
stringStash.initialize(sizeof(char) * bufsize);
ifstream in("CppLibTest.cpp");
assure(in
"CppLibTest.cpp");
string line;
while(getline(in
line))
stringStash.add(line.c_str());
int k = 0;
char* cp;
while((cp =(char*)stringStash.fetch(k++)) != 0)
cout << "stringStash.fetch(" << k << ") = "
<< cp << endl;
intStash.cleanup();
stringStash.cleanup();
} ///:~
One thing you’ll notice is that the
variables are all defined “on the fly” (as introduced in the
previous chapter). That is
they are defined at any point in the scope
rather
than being restricted – as in C – to the beginning of the
scope.
The code is quite similar to
CLibTest.cpp
but when a member function is called
the call occurs using
the member selection operator
‘.’ preceded
by the name of the variable. This is a convenient syntax because it mimics the
selection of a data member of the structure. The difference is that this is a
function member
so it has an argument list.
Of course
the call that the compiler
actually generates looks much more like the original C library function.
Thus
considering name
decoration
and the passing of this
the C++ function call
intStash.initialize(sizeof(int)
100) becomes something like
Stash_initialize(&intStash
sizeof(int)
100). If you ever wonder
what’s going on underneath the covers
remember that the
original
C++ compiler cfront from AT&T produced C code as its output
which
was then compiled by the underlying C compiler. This approach meant that
cfront could be quickly ported to any machine that had a C compiler
and
it helped to rapidly disseminate C++ compiler technology. But because the C++
compiler had to generate C
you know that there must be some way to represent
C++ syntax in C (some compilers still allow you to produce C
code).
There’s one other change from
ClibTest.cpp
which is the introduction of the
require.h header file. This is a header file that
I created for this book to perform more sophisticated error checking than that
provided by assert( ). It contains several functions
including the
one used here called assure( )
which is
used for files. This function checks to see if the file has successfully been
opened
and if not it reports to standard error that the file could not be
opened (thus it needs the name of the file as the second argument) and exits the
program. The require.h functions will be used throughout the book
in
particular to ensure that there are the right number of command-line arguments
and that files are opened properly. The require.h functions replace
repetitive and distracting error-checking code
and yet they provide essentially
useful error messages. These functions will be fully explained later in the
book.
Now that you’ve seen an initial
example
it’s time to step back and take a look at some terminology. The
act of bringing functions inside structures is the root of what C++ adds to C
and it introduces a new way of thinking about structures: as concepts. In C
a
struct is an agglomeration of data
a way to
package data so you can treat it in a clump. But it’s hard to think about
it as anything but a programming convenience. The functions that operate on
those structures are elsewhere. However
with functions in the package
the
structure becomes a new creature
capable of describing both characteristics
(like a C struct does) and behaviors. The concept of an object
a
free-standing
bounded entity that can remember and act
suggests
itself.
In C++
an object is just a variable
and
the purest definition is “a region of storage” (this is a more
specific way of saying
“an object must have a unique
identifier
” which in the case of C++ is a unique
memory address). It’s a place where you can store data
and it’s
implied that there are also operations that can be performed on this
data.
Unfortunately
there’s not complete
consistency across languages when it comes to these terms
although they are
fairly well-accepted. You will also sometimes encounter disagreement about what
an object-oriented language is
although that seems to be reasonably well sorted
out by now. There are languages that are
object-based
which means that they have objects
like the C++ structures-with-functions that you’ve seen so far. This
however
is only part of the picture when it comes to an object-oriented
language
and languages that stop at packaging functions inside data structures
are object-based
not
object-oriented.
The ability to package data with
functions allows you to create a new data type. This is often called
encapsulation[33].
An existing data type may have several pieces of data packaged together. For
example
a float has an exponent
a mantissa
and a sign bit. You can
tell it to do things: add to another float or to an int
and so
on. It has characteristics and behavior.
The definition of Stash creates a
new data type. You can add( )
fetch( )
and
inflate( ). You create one by saying Stash s
just as you
create a float by saying float f. A Stash also has
characteristics and behavior. Even though it acts like a real
built-in data
type
we refer to it as an
abstract
data type
perhaps because it allows us to abstract a concept from the
problem space into the solution space. In addition
the C++ compiler treats it
like a new data type
and if you say a function expects a Stash
the
compiler makes sure you pass a Stash to that function. So the same level
of type checking happens with abstract data types (sometimes called
user-defined types) as
with built-in types.
You can immediately see a difference
however
in the way you perform operations on objects. You say
object.memberFunction(arglist). This is “calling a member function
for an
object.” But in object-oriented parlance
this is also referred to as
“sending a message to an
object.” So for a Stash s
the statement s.add(&i)
“sends a message to s” saying
“add( ) this
to yourself.” In fact
object-oriented programming
can be summed up in a single phrase: sending messages to objects. Really
that’s all you do – create a bunch of objects and send messages to
them. The trick
of course
is figuring out what your objects and messages
are
but once you accomplish this the implementation in C++ is
surprisingly
straightforward.
A question that often comes up in
seminars is
“How big is an object
and what does it look like?” The
answer is “about what you expect from a C struct.” In fact
the code the C compiler produces for a C struct (with no C++ adornments)
will usually look exactly the same as the code produced by a C++
compiler. This is reassuring to those C programmers who depend on the details of
size and layout in their code
and for some reason directly access
structure bytes instead of using identifiers (relying on
a particular size and layout for a structure is a nonportable
activity).
The size of a
struct is the combined
size of all of its members. Sometimes when the compiler lays out a
struct
it adds extra bytes to make the boundaries come out neatly
– this may increase execution efficiency. In Chapter 15
you’ll see
how in some cases “secret” pointers are added to the structure
but
you don’t need to worry about that right now.
//: C04:Sizeof.cpp
// Sizes of structs
#include "CLib.h"
#include "CppLib.h"
#include <iostream>
using namespace std;
struct A {
int i[100];
};
struct B {
void f();
};
void B::f() {}
int main() {
cout << "sizeof struct A = " << sizeof(A)
<< " bytes" << endl;
cout << "sizeof struct B = " << sizeof(B)
<< " bytes" << endl;
cout << "sizeof CStash in C = "
<< sizeof(CStash) << " bytes" << endl;
cout << "sizeof Stash in C++ = "
<< sizeof(Stash) << " bytes" << endl;
} ///:~
On my machine (your results may vary) the
first print statement produces 200 because each int occupies two bytes.
struct B is something of an anomaly because it is a struct with no
data members. In C
this is illegal
but in C++ we need the option of creating a
struct whose sole task is to scope function names
so it is allowed.
Still
the result produced by the second print statement is a somewhat
surprising nonzero value. In
early versions of the language
the size was zero
but an awkward situation
arises when you create such objects: They have the same address as the object
created directly after them
and so are not distinct. One of the fundamental
rules of objects is that each
object must have a unique address
so structures with no data members will
always have some minimum nonzero size.
The last two sizeof statements
show you that the size of the structure in C++ is the same as the size of the
equivalent version in C. C++ tries not to add any unnecessary
overhead.
When you create a struct
containing member functions
you are creating a new data type. In general
you
want this type to be easily accessible to yourself and others. In addition
you
want to separate the interface (the declaration) from
the implementation (the definition of the member
functions) so the implementation can be changed without forcing a re-compile of
the entire system. You achieve this end by putting the declaration for your new
type in a header file.
When I first learned to program in C
the
header file was a mystery to me.
Many C books don’t seem to emphasize it
and the compiler didn’t
enforce function declarations
so it seemed optional most of the time
except
when structures were declared. In C++ the use of header files becomes crystal
clear. They are virtually mandatory for easy program development
and you put
very specific information in them: declarations. The
header file tells the compiler what is available in your library. You can use
the library even if you only possess the header file along with the object file
or library file; you don’t need the source code for the cpp file.
The header file is where the interface specification is stored.
Although it is not enforced by the
compiler
the best approach to building large projects in C is to use libraries;
collect associated functions into the same object module or library
and use a
header file to hold all the declarations for the functions. It is de
rigueur in C++; you could throw any function into a C library
but the C++
abstract data type determines the functions that are associated by dint of their
common access to the data in a struct. Any member function must be
declared in the struct declaration; you cannot put it elsewhere. The use
of function libraries was encouraged in C and institutionalized in
C++.
When using a function from a library
C
allows you the option of ignoring the header file and simply declaring the
function by hand. In the past
people would sometimes do this to speed up the
compiler just a bit by avoiding the task of opening and including the file (this
is usually not an issue with modern compilers). For example
here’s an
extremely lazy declaration of the C function printf( ) (from
<stdio.h>):
printf(...);
The ellipses
specify a
variable
argument
list[34]
which says: printf( ) has some arguments
each of which has a type
but ignore that. Just take whatever arguments you see and accept them. By using
this kind of declaration
you suspend all error checking on the
arguments.
This practice can cause subtle problems.
If you declare functions by hand
in one file you may make a mistake. Since the
compiler sees only your hand-declaration in that file
it may be able to adapt
to your mistake. The program will then link correctly
but the use of the
function in that one file will be faulty. This is a tough error to find
and is
easily avoided by using a header file.
If you place all your function
declarations in a header file
and include that header everywhere you use the
function and where you define the function
you ensure a consistent declaration
across the whole system. You also ensure that the
declaration and the definition
match by including the header in the definition file.
If a struct is declared in a
header file in C++
you must include the header file everywhere a
struct is used and where struct member functions are defined. The
C++ compiler will give an error message if you try to call a regular function
or to call or define a member function
without declaring it first. By enforcing
the proper use of header files
the language ensures
consistency in libraries
and reduces bugs by forcing the same interface to be
used everywhere.
The header is a contract between you and
the user of your library. The contract describes your data structures
and
states the arguments and return values for the function calls. It says
“Here’s what my library does.” The user needs some of this
information to develop the application and the compiler needs all of it to
generate proper code. The user of the struct simply includes the header
file
creates objects (instances) of that struct
and links in the object
module or library (i.e.: the compiled code).
The compiler enforces the contract by
requiring you to declare all structures and functions before they are used and
in the case of member functions
before they are defined. Thus
you’re
forced to put the declarations in the header and to include the header in the
file where the member functions are defined and the file(s) where they are used.
Because a single header file describing your library is included throughout the
system
the compiler can ensure consistency and prevent
errors.
There are certain issues that you must be
aware of in order to organize your code
properly and write effective
header files. The first issue concerns what you can put into header files. The
basic rule is “only declarations
” that is
only information to the compiler but nothing that allocates storage by
generating code or creating variables. This is because the header file will
typically be included in several translation units in a project
and if storage
for one identifier is allocated in more than one place
the linker will come up
with a multiple definition error (this is C++’s
one definition rule: You
can declare things as many times as you want
but there can be only one actual
definition for each thing).
This rule isn’t completely hard and
fast. If you define a variable that is “file
static” (has visibility only within a file) inside
a header file
there will be multiple instances of that data across the project
but the linker won’t have a
collision[35].
Basically
you don’t want to do anything in the header file that will
cause an ambiguity at link time.
The second header-file issue is this:
when you put a struct
declaration in a header file
it is possible for the file to be included more
than once in a complicated program. Iostreams are a good example. Any time a
struct does I/O it may include one of the iostream headers. If the cpp
file you are working on uses more than one kind of struct (typically
including a header file for each one)
you run the risk of including the
<iostream> header more than once and re-declaring
iostreams.
The compiler considers the
redeclaration of a structure (this includes both
structs and classes) to be an error
since it would
otherwise allow you to use the same name for different types. To prevent this
error when multiple header files are included
you need to build some
intelligence into your header files using the preprocessor
(Standard C++ header files like <iostream>
already have this “intelligence”).
Both C and C++ allow you to redeclare a
function
as long as the two declarations match
but neither will allow the
redeclaration of a
structure.
In C++ this rule is especially important because if the compiler allowed you to
redeclare a structure and the two declarations differed
which one would it
use?
The problem of redeclaration comes up
quite a bit in C++ because each data type (structure with functions) generally
has its own header file
and you have to include one header in another if you
want to create another data type that uses the first one. In any cpp file
in your project
it’s likely that you’ll include several files that
include the same header file. During a single compilation
the compiler can see
the same header file several times. Unless you do something about it
the
compiler will see the redeclaration of your structure and report a compile-time
error. To solve the problem
you need to know a bit more about the
preprocessor.
The preprocessor directive #define
can be used to create compile-time flags. You have two choices: you can simply
tell the preprocessor that the flag is defined
without specifying a
value:
#define FLAG
or you can give it a value (which is the
typical C way to define a constant):
#define PI 3.14159
In either case
the label can now be
tested by the preprocessor to see if it has been defined:
#ifdef FLAG
This will yield a true result
and the
code following the #ifdef will be included in the package sent to the
compiler. This inclusion stops when the preprocessor encounters the
statement
#endif
or
#endif // FLAG
Any non-comment after the #endif
on the same line is illegal
even though some compilers may accept it. The
#ifdef/#endif pairs may be nested within each
other.
The complement of #define is
#undef (short for “un-define”)
which will make an #ifdef
statement using the same variable yield a false result. #undef will
also cause the preprocessor to stop using a macro. The complement of
#ifdef is #ifndef
which will yield a true
if the label has not been
defined (this is the one we will use in header files).
There are other useful features in the C
preprocessor. You should check your local documentation for the full set.
In each header file that contains a
structure
you should first check to see if this header has already been
included in this particular cpp file. You do this by testing a
preprocessor flag. If the flag isn’t set
the file wasn’t included
and you should set the flag (so the structure can’t get re-declared) and
declare the structure. If the flag was set then that type has already been
declared so you should just ignore the code that declares it. Here’s how
the header file should look:
#ifndef HEADER_FLAG #define HEADER_FLAG // Type declaration here... #endif // HEADER_FLAG
As you can see
the first time the header
file is included
the contents of the header file (including your type
declaration) will be included by the preprocessor. All the subsequent times it
is included – in a single compilation unit – the type declaration
will be ignored. The name HEADER_FLAG can be any unique name
but a reliable
standard to follow is to capitalize the name of the header file and replace
periods with underscores (leading underscores
however
are reserved for system
names). Here’s an example:
//: C04:Simple.h
// Simple header that prevents re-definition
#ifndef SIMPLE_H
#define SIMPLE_H
struct Simple {
int i
j
k;
initialize() { i = j = k = 0; }
};
#endif // SIMPLE_H ///:~
Although the SIMPLE_H after the
#endif is commented out and thus ignored by the preprocessor
it is
useful for documentation.
These preprocessor statements that
prevent multiple inclusion are often referred to as include
guards.
You’ll notice that
using
directives are present in nearly all the cpp files in this book
usually in the form:
using namespace std;
Since std is the namespace that
surrounds the entire Standard C++ library
this particular using directive
allows the names in the Standard C++ library to be used without qualification.
However
you’ll virtually never see a using directive in a header file (at
least
not outside of a scope). The reason is that the using directive
eliminates the protection of that particular namespace
and the effect lasts
until the end of the current compilation unit. If you put a using directive
(outside of a scope) in a header file
it means that this loss of
“namespace protection” will occur with any file that includes this
header
which often means other header files. Thus
if you start putting using
directives in header files
it’s very easy to end up “turning
off” namespaces practically everywhere
and thereby neutralizing the
beneficial effects of namespaces.
When building a project in C++
you’ll usually create it by bringing together a lot of different types
(data structures with associated functions). You’ll usually put the
declaration for each type or group of associated types in a separate header
file
then define the functions
for that type in a translation unit. When you use that type
you must include
the header file to perform the declarations properly.
Sometimes that pattern will be followed
in this book
but more often the examples will be very small
so everything
– the structure declarations
function definitions
and the
main( ) function – may appear in a single file. However
keep
in mind that you’ll want to use separate files and header files in
practice.
The convenience of taking data and
function names out of the global name space extends to structures. You can nest
a structure within another structure
and therefore keep associated elements
together. The declaration syntax is what you would expect
as you can see in the
following structure
which implements a push-down stack as a simple linked list
so it “never” runs
out of memory:
//: C04:Stack.h
// Nested struct in linked list
#ifndef STACK_H
#define STACK_H
struct Stack {
struct Link {
void* data;
Link* next;
void initialize(void* dat
Link* nxt);
}* head;
void initialize();
void push(void* dat);
void* peek();
void* pop();
void cleanup();
};
#endif // STACK_H ///:~
The nested struct is called
Link
and it contains a pointer to the next Link in the list and a
pointer to the data stored in the Link. If the next pointer is
zero
it means you’re at the end of the list.
Notice that the head pointer is
defined right after the declaration for struct Link
instead of a
separate definition Link* head. This is a syntax that came from C
but it
emphasizes the importance of the semicolon after the structure declaration; the
semicolon indicates the end of the comma-separated list of definitions of that
structure type. (Usually the list is empty.)
The nested structure has its own
initialize( ) function
like all the structures presented so far
to
ensure proper initialization. Stack has both an initialize( )
and cleanup( ) function
as well as push( )
which takes
a pointer to the data you wish to store (it assumes this has been allocated on
the heap)
and pop( )
which returns the data pointer from
the top of the Stack and removes the top element. (When you
pop( ) an element
you are responsible for destroying the object
pointed to by the data.) The peek( ) function also returns
the data pointer from the top element
but it leaves the top element on
the Stack.
Here are the definitions for the member
functions:
//: C04:Stack.cpp {O}
// Linked list with nesting
#include "Stack.h"
#include "../require.h"
using namespace std;
void
Stack::Link::initialize(void* dat
Link* nxt) {
data = dat;
next = nxt;
}
void Stack::initialize() { head = 0; }
void Stack::push(void* dat) {
Link* newLink = new Link;
newLink->initialize(dat
head);
head = newLink;
}
void* Stack::peek() {
require(head != 0
"Stack empty");
return head->data;
}
void* Stack::pop() {
if(head == 0) return 0;
void* result = head->data;
Link* oldHead = head;
head = head->next;
delete oldHead;
return result;
}
void Stack::cleanup() {
require(head == 0
"Stack not empty");
} ///:~
The first definition is particularly
interesting because it shows you how to define a member of a nested structure.
You simply use an additional level of scope resolution to specify the name of
the enclosing struct. Stack::Link::initialize( ) takes the
arguments and assigns them to its members.
Stack::initialize( ) sets
head to zero
so the object knows it has an empty list.
Stack::push( ) takes the
argument
which is a pointer to the variable you want to keep track of
and
pushes it on the Stack. First
it uses new to allocate storage for
the Link it will insert at the top. Then it calls Link’s
initialize( ) function to assign the appropriate values to the
members of the Link. Notice that the next pointer is assigned to
the current head; then head is assigned to the new Link
pointer. This effectively pushes the Link in at the top of the
list.
Stack::pop( ) captures the
data pointer at the current top of the Stack; then it moves the
head pointer down and deletes the old top of the Stack
finally
returning the captured pointer. When pop( ) removes the last
element
then head again becomes zero
meaning the Stack is
empty.
Stack::cleanup( )
doesn’t actually do any cleanup. Instead
it establishes a firm policy
that “you (the client programmer using this Stack object) are
responsible for popping all the elements off this Stack and deleting
them.” The require( ) is used to indicate that a programming
error has occurred if the Stack is not empty.
Why couldn’t the Stack
destructor be responsible for all the objects that the client programmer
didn’t pop( )? The problem is that the Stack is holding
void pointers
and you’ll learn in Chapter 13 that calling
delete for a void* doesn’t clean things up properly. The
subject of “who’s responsible for the memory” is not even
that simple
as we’ll see in later chapters.
Here’s an example to test the
Stack:
//: C04:StackTest.cpp
//{L} Stack
//{T} StackTest.cpp
// Test of nested linked list
#include "Stack.h"
#include "../require.h"
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main(int argc
char* argv[]) {
requireArgs(argc
1); // File name is argument
ifstream in(argv[1]);
assure(in
argv[1]);
Stack textlines;
textlines.initialize();
string line;
// Read file and store lines in the Stack:
while(getline(in
line))
textlines.push(new string(line));
// Pop the lines from the Stack and print them:
string* s;
while((s = (string*)textlines.pop()) != 0) {
cout << *s << endl;
delete s;
}
textlines.cleanup();
} ///:~
This is similar to the earlier example
but it pushes lines from a file (as string pointers) on the
Stack and then pops them off
which results in the file being printed out
in reverse order. Note that the pop( ) member function returns a
void* and this must be cast back to a string* before it can be
used. To print the string
the pointer is dereferenced.
As textlines is being filled
the
contents of line is “cloned” for each push( ) by
making a new string(line). The value returned from the new-expression is
a pointer to the new string that was created and that copied the
information from line. If you had simply passed the address of
line to push( )
you would end up with a Stack filled
with identical addresses
all pointing to line. You’ll learn more
about this “cloning” process later in the book.
The file name is taken from the command
line. To guarantee that there are enough arguments on
the command line
you see a second function used from the
require.h header file:
requireArgs( )
which compares argc
to the desired number of arguments and prints an appropriate error message and
exits the program if there aren’t enough
arguments.
The scope resolution operator gets you
out of situations in which the name the compiler chooses by default (the
“nearest” name) isn’t what you want. For example
suppose you
have a structure with a local identifier a
and you want to select a
global identifier a from inside a member function. The compiler would
default to choosing the local one
so you must tell it to do otherwise. When you
want to specify a global name using scope resolution
you use the
operator with nothing in front
of it. Here’s an example that shows global scope resolution for both a
variable and a function:
//: C04:Scoperes.cpp
// Global scope resolution
int a;
void f() {}
struct S {
int a;
void f();
};
void S::f() {
::f(); // Would be recursive otherwise!
::a++; // Select the global a
a--; // The a at struct scope
}
int main() { S s; f(); } ///:~
Without scope resolution in
S::f( )
the compiler would default to selecting the member versions
of f( ) and
a.
In this chapter
you’ve learned the
fundamental “twist” of C++: that you can place functions inside of
structures. This new type of structure is called an abstract data type
and variables you create using this structure are called objects
or
instances
of that type. Calling a member function for an object is
called sending a message to that object. The primary action in
object-oriented programming is sending messages to objects.
Although packaging data and functions
together is a significant benefit for code organization and makes library use
easier because it prevents name clashes by hiding the names
there’s a lot
more you can do to make programming safer in C++. In the next chapter
you’ll learn how to protect some members of a struct so that only
you can manipulate them. This establishes a clear boundary between what the user
of the structure can change and what only the programmer may
change.
Solutions to selected exercises
can be found in the electronic document The Thinking in C++ Annotated
Solution Guide
available for a small fee from
http://www.BruceEckel.com.
[33]
This term can cause debate. Some people use it as defined here; others use it to
describe access control
discussed in the following
chapter.
[34]
To write a function definition for a function that takes a true variable
argument list
you must use varargs
although these should be avoided in
C++. You can find details about the use of varargs in your C
manual.
[35]
However
in Standard C++ file static is a deprecated feature.