Program Proving
If we could prove that programs are correct in
the same way that we prove a theorem in algebra or geometry,
then that modern excuse for all problems:
"It was an error in our computer!"
would have to be consigned to the scrap-heap.
Unfortunately, although considerable progress has been made
in the area of formal methods -
strategies for designing programs which are
correct by construction
ie they were constructed from the
problem specification in a way that guarantees that the
final program is correct,
these methods are still time-consuming and labour-intensive
(and therefore costly), and, because they are carried out by
humans, still subject to the same errors that programmers
make.
As a consequence,
although they can be usefully and economically applied
in a widening area as techniques (and particularly support
tools) improve,
we mostly rely on testing programs after they are written
in order to determine whether they are reliable and robust.
How do we test a program?
Our experience with the first assignment has already taught us
one way not to test programs!
Don't rely on random test data input by a user!
The reason for this is very clear and simple:
a user inputting test data may neglect to enter data which
corresponds to some very important case
(eg the case where we don't have sufficient funds in
our bank account to cover a withdrawal).
Since a complex program will contain many such distinct cases
which need to be checked as part of the testing process,
then the probability of a tester forgetting a vital case
rises dramatically with the size of the program under test.
Furthermore, every time the program is changed -
whether it be in the initial debugging phase or later
(perhaps months or years later) when it needs maintenance -
all the tests should be re-run to ensure that
a change to fix one problem hasn't inadvertently created
errors elsewhere!
In a complex program with thousands of cases to be tested,
a human tester is simply not going to have time, let alone
the patience, to enter all the test data needed to
perform a thorough test again.
|
"Random" methods for debugging
are not very effective ..
a systematic approach is needed!
|
Test Programs
Thus, in order to ensure that all required tests are carried
out, these tests should be applied mechanically by a program.
Not only does this ensure that no important test is left out,
it is quick and efficient - a few hours of computer time to
run a few million tests is cheap compared with a programmer's
time - or the cost of an error in a production system!
Where does the test data go?
Depending on the nature of the tests which need to be carried
out, the tests can be 'driven' by:
- program statements
- data in tables
- data read from files
In what follows, I shall assume that we are testing sets
of functions which are the methods of a class.
However, the strategies outlined here can readily be extended
to testing other functions and whole programs.
A whole program can simply be viewed as a function which takes
various user data (from keyboard input, files, databases, etc)
and produces some output (onto a terminal screen, into files, databases,
etc).
In every case, a basic test driver program is written which
applies the necessary tests to the functions of the
class which you are testing.
Program Statements
These tests can be applied by separate program statements:
#define A1 10.00
#define W1 5.00
#define W2 20.00
/* Create a bank account */
account = ConsAccount();
/* Make a deposit */
balance = Deposit( account, A1 );
/* Test withdrawals */
balance = Withdraw( account, W1 ); /* Should succeed */
assert( balance == (A1-W1) );
balance = Withdraw( account, W2 ); /* Should produce error message */
assert( balance == (A1-W1) ); /* Balance should not change */
If there are only a small number of cases or the cases
are complex - involving complex rules based on many attributes -
then this approach is fine.
However, it may require considerable effort on the part of the
programmer and table or file data is to be preferred when it's
practical.
Tables
When the cases to be tested involve a large number of values
of individual attributes,
it is usually better to put the values to be tested in
tables inside the program or files which are read by it:
Click here to load a sample test function in a separate window.
Note:
- The structure test_table has the
values for the test to be applied and
the expected answers (expected_balance and
error_expected).
- The size of test_cases array is defined
by the initialisation data (using [] for the array
dimensions):
this enables additional tests to be added as necesary
with no changes to the rest of the test function!
- N_W_TESTS is set by using the
sizeof
operator on the array and the struct's which are
its elements.
- The tests are run in the
for(i=0;i<N_W_TESTS;i++)
loop.
The same code is used for each test, so that additional tests
are added with no changes to this loop.
- The value returned by the Withdraw
function is checked against a value in the test array -
enabling the test function to automatically detect
and count errors.
Thus this function produces no output when there are no
errors; it simply returns a 0 to its calling function.
This has an important testing efficiency consequence,
the tester does not have to wade through reams and reams
of test output - the program only needs to produce
error reports when errors occur; otherwise it is
pleasantly silent!
- A function AlarmRaised is assumed to be able to
detect whether the error required by the specifications
was raised: this is dependent on operating system capabilities
and thus, being non-standard, is beyond the scope of this course.
- (For pedants only!) It's assumed here that the tester
wants to count balance and alarm errors as separate errors adding to
the total error count.
This is a practical assumption, because the really critical thing about
the error count is whether it's 0 or something else.
Slightly more complex code could be used to either
- count balance and alarm errors separately or
- count each function call which produces either error once only.
This is left as a programming exercise to those concerned!
Style Reminders
- Note that all the constants used here are symbols!
This facilitates the generation of the result expressions,
A1-W1.
- Strictly, comparing two doubles for equality will not
always work.
The test should strictly have been:
if ( abs(balance - test_cases[i].expected) > EPS ) ...
with a suitable value for EPS.
The example code was left in the simpler (but incorrect) form
for clarity.
Files
Just as test values can be placed in tables, they can also
be placed in files.
Usually this testing strategy is essentially identical to the table
one - the data that would be stored in the testing table is
stored in files instead.
Thus in the example above, we would create a test file:
5.00 15.00 F
10.00 15.00 T
In this format,
F is read as FALSE and
T as TRUE.
Click here to load the test function modified to work with a file
in a separate window.
Note:
- The main advantage of this approach over the table method is
that a file is external to the program and thus may be modified in
a single step with a text editor.
- The test loop is essentially identical to the one using
tables - as might be expected.
- The read_test is somewhat more complex:
the extra complexity here is the trade-off for the convenience of
being able to modify the test cases without needing to reconstruct
the program.
- read_test, returns
FALSE when the test
file is exhausted: thus the number of tests performed is
readily altered by adding or deleting entries from the 'driver'
file.