7 Optimizing Your Project

 

  <--Last Chapter Table of Contents Next Chapter-->  

 
Optimization is the customization of a program to run as small and/or as fast as possible on a particular type of computer.

If you program is running slower than you expected or is using more memory or disk space than you expected you should first examine the approach you used in your Ada source code. Can you use better data structures or implement faster algorithms? For example a bubble sort is an easy way to sort relatively small amounts of data but a quick sort is faster on thousands or millions of pieces of data.

In large programs the subprogram causing the biggest bottlenecks may not be obvious. Experimenting with different test data and timing the results can often narrow down the problem areas. You could also try the gprof profiling tool which will give you statistics on your program performance and will show that you are on the right track. Why spend hours or days improving a section of your program that isn't causing the problem? This is especially important in a business environment: focus your time on the sections that will give the greatest improvements.

Some optimizations can be done automatically by the Gnat compiler. There are both compiler switches and language pragmas for fine tuning your programs.

7.1 Compiler Optimization Options

There are several compiling switches used to optimize programs.

The -O switch tells the compiler how much time it should spend optimizing the program:

When using floating point numbers you may experience rounding errors if you don't use the -ffloat-store switch as discussed in 8.5.

Inlining is also affected by two other switches:

These switches both require a -O switch for inlining to take effect.

The -gnatp switch turns off all non-essential error checking such as constraint and range checks. This is the same as using pragma Suppress( All_Checks ) on every file in the entire program making the program smaller and faster.

There are some other gcc optimization switches which can sometimes be used:

-ffast-math - gcc will ignore certain ANSI & IEEE math rules. For example it will not check for negative numbers before invoking the sqrt function. This improves math performance but can cause side- effects for libraries expecting the ANSI/IEEE rules to be honoured.

-fomit-frame-pointer - gcc will free the register usually dedicated to hold the stack frame pointer. This improves performance but makes debugging difficult--many debugging utilities require the frame pointer.

IDE: TIA sets the proper switches for you based on your selections in the project parameters window.

7.2 Gnat Source Optimization Options

Ada Package Description C Equivalent
pragma Pack( Aggregate ); Use minimum space for the aggregate. -
pragma Optimize( Space / Time / Off ); How you want your statements optimized. -
pragma Inline( Subprogram ); Inline the subprogram inline
pragma Inline_Always( Subprogram ); Inline the subprogram  -
pragma Discard_Names( type ); Don't include ASCII identifiers in executable. -
There are six pragmas available to change the size and execution speed of your program.

Pragma Pack compresses an array record or tagged record so that it uses the minimum space possible. For example a packed boolean array takes up one bit for each boolean. Pack only packs the aggregate not any aggregate items that might make up the aggregate: if you have an array of records you'll need to both pack the array and the records to use the minimum space possible. Packing aggregates usually slows down the execution of your program.

type CustomerProfile is record
  Preferred : boolean;
  PreordersAllowed : boolean;
  SalesToDate : float;
end record;
pragma Pack( CustomerProfile );
Gnat can perform close packing that is packing right down to individual bits for array elements or records of 64 bits or smaller.

Pragma Optimize specifies how you want your statements to be optimized: to run as fast as possible (time) to be as small as possible (space) or no optimization at all. Optimize does not affect data structures.

pragma Optimize ( space );
package body AccountsPayable is
Pragma Inline makes Ada inline the subprogram whenever possible. That is it physically inserts the subprogram whenever it's named instead of calling it in order the make your program run faster. This uses up a lot of space and is only practical for small procedures and functions.
procedure Increment( x : integer ) is
begin
  x := x + 1;
end Increment;
pragma Inline( Increment );

Compiling switch -O3 must be used or pragma inline is ignored. -O3 will also automatically inline short subprograms for you.

Pragma Inline_Always forces inlining between packages (like -gnatn) regardless of whether or not -gnatn or -gnatN has been used.

Pragma Discard_Names frees up space by discarding the ASCII images (names) of identifiers. For example if you have a big enumerated type Ada normally maintains strings for the names of each of the enumerated items in case you want to use the 'img attribute. You can discard these names if you never intend to use 'img.

type aDogBreed is (Unknown
Boxer
Shepherd
MixedBreed );
pragma Discard_Names( aDogBreed );

When you discard names the 'img is still available. Instead of returning the enumerated value's image 'img returns the position of the enumerated type (for example 0 1 2 and so forth).

Fun Fact: The ASCII images of your variable names are stored as C strings at the end of your executable file. You can view them using the less (or strings) shell command.
 

7.3 CPU Optimization Options

There are two main CPU optimization switches in GCC 2.x as listed in the GCC manual:

-mno-486 - optimize for 80386.
-m486 - optimize for 80486. These programs will still run on a 80386.

NoteFuture versions of Gnat built for GCC 3.x or later will probably support:
  • -mpentium - optimize for Pentium / Intel 586
  • -mcpu=i686 - optimize for Pentium II/ Intel 686
  • -mcpu=k6 - optimize for AMD K6

There are currently no switches newer CPUs such as Pentiums. Under GCC 2.8.1 (and Gnat) the GCC FAQ recommends the following switches for reasonable Pentium performance: "-m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -fno-strength-reduce".

There are other switches that may or may not be helpful depending on your program: read the gcc FAQ for full details.

IDE: TIA sets the proper switches for you based on your selections in the project parameters window.

Let's put all these flags together. Suppose you are trying to develop a program for the Intel Pentium CPU with an emphasis on speed. During development the Gnatmake switches would be "-O1" since this setting suppresses pragma optimize warnings. For the final release the Gnatmake switches should be "-m486 -O3 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -fno-strength-reduce -gnatp" for maximum performance on a Pentium processor.
 

7.4 What Difference Does Optimization Make?

In the previous sections we saw GCC compiler switches and and Ada pragmas that affect the speed and size of your finished application. But how much of a difference does optimization make? And are there any problems caused by optimization?

The optimization switches and pragmas affect different applications differently. Some will give better results to certain kinds of applications while others may actually have a negative effect. The following table summarizes the results of optimizing on the Hartstone Ada benchmark program. Hartstone is a multithreading mathematics test available freely on the Internet http://ftp.sunet.se/pub4/benchmark/hartstone/.

Table: Hartstone 1.1 Benchmark Summary

Gnat switches Ada pragmas CPU Time File Size Task Set Util
-gnatE -gnato -g - 0.13s 294265 0.41%
-gnatE -gnato - 0.13s 147433 0.41%
-gnatE - 0.10s 138679 0.32%
no switches - 0.10s 138679 0.29%
-O - 0.07s 113076 0.22%
-O2 - 0.07s 113324 0.22%
-O3 - 0.07s 118790 0.20%
-O3 -gnatp - 0.08s 104290 0.37%
-O3 -gnatp Pent - 0.08s 105042 0.15%
Max - 0.05s 105714 0.15%
Max Optimize( Space ) 0.05s 105714 0.15%
Max Optimize( Time ) 0.05s 105714 0.15%
Max Pack arrays 0.11s 105712 0.15%

Pent - GCC Pentium optimization switches
Max - Pent + -ffast-math + -fomit-frame-pointer
This test was conducted with a Pentium II 350 64 Megs RAM and ALT Gnat 3.12p-9. As they say your milage many vary (and probably will).

By optimizing the application Hartstone can be reduced to half its size and run about 2/3 faster than using no optimization. However if we pack the arrays in Hartstone we save two bytes but lose all the improvements in speed. Sometimes smaller programs are not faster.

Let's try optimizing a convoluted program that uses integers arrays functions and mathematics and see what effect the optimization techniques have.

procedure bench is

  --Simple benchmark program to test optimization

  pragma optimize( time );

  type bench_integer is new long_integer range long_integer'range;
  type small_integer is new long_integer range 0..9;

  function p( param : bench_integer ) return bench_integer is
    divideby : constant bench_integer := 4;
  begin
    return param / divideby;
  end p;
  pragma inline( p );

  j : bench_integer := bench_integer'last;

  -- deliberate error in main program for j * 2

  type atype is array(0..9) of small_integer;
  --pragma pack( atype);

  a : atype;

begin

  for i in 1..100_000_000 loop

    j := abs( p( bench_integer( i ) ) - (j * 2) );
      a( integer( j mod 10 ) ) := small_integer( j mod
      bench_integer( small_integer'last) );

  end loop;

end bench;

Notice that j is assigned the largest bench_integer possible. This will force an overflow error the first time around the for loop when j is multiplied by two. The following chart shows the effect of the different switches and pragmas and indicates when gnat caught the overflow error. The test was conducted on a Pentium II 350 with 64 Megs of RAM using the gnat 3.11 NYU binaries and was timed with the time command.

Gnatmake Switches Pragmas CPU Time Size Error Caught?
gnatmake -gnato -gnatE  -  - 118162 YES
gnatmake -gnato  -  - 118162 YES
gnatmake -gnatE  - 40.3 s 118162 No
gnatmake  - 40.3 s 117634 No
gnatmake -O  - 10.8 s 117426 No
gnatmake -O2  - 10.8 s 117426 No
gnatmake -O3  - 10.8 s 117426 No
gnatmake -O3 -gnatp  - 9.6 s 117410 No
gnatmake -O3 -gnatp Pent  - 9.6 s 117410 No
gnatmake -O3 -gnatp Pent Optimize( Space ) 9.6 s 117410 No
gnatmake -O3 -gnatp Pent Optimize( Time ) 9.6 s 117410 No
gnatmake -O3 -gnatp Pent Pack atype 4.4 s 117326 No
 
Although the proper optimization can make this program run faster but with overflow checking was turned on with -gnato the overflow error is caught. The lesson here is that error checking only works when it's turned on.

We can compare the results to the equivalent C program:

int p( int param ) {
  return param / 4;
}

int i;
int j = 2147483647;
int a[10];
int main() {
  for (i=1; i<=100000000; i++) {
    j = abs( p(i)-(j*2));
    a[ j%10 ] = j%10;
  }

  return 0;
}
GCC Switches Pragmas CPU Time Size Error Caught?
gcc -Wall  - 12.8 s 24541 No
gcc -O3 Pent  - 8.6 s 24541 No

In this case notice that C never detected the overflow error. Secondly notice that the Ada program ran twice as fast as the C program.

In theory an Ada compiler can take advantage of the typing information and the optimization hints provided by the pragmas. The C compiler has less information and this can hinders the optimization process. (I've never investigated whether or not Gnat does this or how much of an effect it has.)

The optimization techniques will affect different programs differently. You need to chose the best approach for your particular project.

 

7.5 Working with the Assembly Source

Assembly language is the low-level programming language for working with the hardware of a particular computer. Using assembly language you can access the processor registers use unusual features of the processor and dictate exactly which operations the processor performs. Assembly language programs are usually several times smaller and faster than programs written in high-level languages but they are also several times harder to build maintain and debug.

The Linux assembler is called gas (the GNU assembler). Like GNAT and C++ gas works through gcc. To assemble an assembly language source file simply run gcc. The compiler will recognize the assembly language file and will assemble it using gas.

If you want to view the assembly source code of your Ada program use the "-c -S -fverbose-asm" options when compiling. GNAT will create a file with a ".s" suffix containing the assembly source. You can view it or even edit it and assemble afterwards. Improving the instructions produced by the compiler and then assembling afterwards is known as hand optimizing. This technique is typically used for high performance applications such as games where the programmer needs to get the maximum performance from the hardware.

The following is the stderr.s file for the stderr.adb program described elsewhere in this document.

.file"stderr.adb"

.version"01.01"
/ GNU Ada version 2.8.1 (i686-pc-linux-gnu) compiled by GNU C version 2.8.1.
/ options passed:-I../texttools/ -mcpu=i486 -march=i486 -gnatp -gnatf -O3
/ -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2
/ -fno-strength-reduce -fverbose-asm
/ options enabled:-fdefer-pop -fcse-follow-jumps -fcse-skip-blocks
/ -fexpensive-optimizations -fthread-jumps -fpeephole -fforce-mem
/ -ffunction-cse -finline-functions -finline -fkeep-static-consts
/ -fcaller-saves -fpcc-struct-return -frerun-cse-after-loop
/ -fschedule-insns2 -fcommon -fverbose-asm -fgnu-linker -m80387
/ -mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387
/ -mschedule-prologue -mcpu=i486 -march=i486 -malign-loops=2
/ -malign-jumps=2 -malign-functions=2
gcc2_compiled.:
.section.rodata
.LC0:
.string"This is an example of writing error messages to stderr"
.align 4
.LC1:
.long 1
.long 54
.LC2:
.string"This message is on standard error"
.align 4
.LC3:
.long 1
.long 33
.LC4:
.string"This message is on standard output"
.align 4
.LC5:
.long 1
.long 34
.LC6:
.string"This is also on standard error"
.align 4
.LC7:
.long 1
.long 30
.LC8:
.string"But this is on standard output"
.text
.align 4
.globl _ada_stderr
.type_ada_stderr @function
_ada_stderr:
pushl %ebp
movl %esp %ebp
movl $.LC0 %eax
movl $.LC1 %edx
pushl %edx
pushl %eax
call ada__text_io__put_line__2
pushl $1
call ada__text_io__new_line__2
movl $.LC2 %eax
movl $.LC3 %edx
pushl %edx
pushl %eax
call ada__text_io__standard_error
pushl %eax
call ada__text_io__put_line
movl $.LC4 %eax
movl $.LC5 %edx
pushl %edx
pushl %eax
call ada__text_io__put_line__2
addl $32 %esp
pushl $1
call ada__text_io__new_line__2
call ada__text_io__standard_error
pushl %eax
call ada__text_io__set_output
movl $.LC6 %eax
movl $.LC7 %edx
pushl %edx
pushl %eax
call ada__text_io__put_line__2
call ada__text_io__standard_output
pushl %eax
call ada__text_io__set_output
movl $.LC8 %eax
movl $.LC7 %edx
pushl %edx
pushl %eax
call ada__text_io__put_line__2
movl %ebp %esp
popl %ebp
ret
.Lfe1:
.size_ada_stderr .Lfe1-_ada_stderr
.ident"GCC: (GNU) 2.8.1"

See Chapter 19 for a discussion of embedding assembly language into an Ada program.

 

  <--Last Chapter Table of Contents Next Chapter-->