Making the most of conditional execution


About this recipe

In this recipe you learn how conditional execution can eliminate branch instructions, producing smaller and faster code. Euclid's Greatest Common Divisor algorithm is used for illustrative purposes. Specifically, you will learn how to use:

The ARM's ALU status flags

The ARM's Program Status Register contains, among other flags, copies of the ALU status flags:

---------------------------------------------------------
N     |Negative result from ALU flag                     
---------------------------------------------------------
Z     |Zero result from ALU flag                         
---------------------------------------------------------
C     |ALU operation Carried out                         
---------------------------------------------------------
V     |ALU operation oVerflowed                          
---------------------------------------------------------

Execution conditions

Every ARM instruction has a 4 bit field which encodes the conditions under which it will be executed. These conditions refer to the state of the ALU N, Z, C and V flags as follows:

--------------------------------------------------------
EQ     |Z set (equal)                                   
--------------------------------------------------------
NE     |Z clear (not equal)                             
--------------------------------------------------------
CS/HS  |C set (unsigned >=)                             
--------------------------------------------------------
CC/LO  |C clear (unsigned <)                            
--------------------------------------------------------
MI     |N set (negative)                                
--------------------------------------------------------
PL     |N clear (positive or zero)                      
--------------------------------------------------------
VS     |V set (overflow)                                
--------------------------------------------------------
VC     |V clear (no overflow)                           
--------------------------------------------------------
HI     |C set and Z clear (unsigned >)                  
--------------------------------------------------------
LS     |C clear and Z set (unsigned <=)                 
--------------------------------------------------------
GE     |N and V the same (signed >=)                    
--------------------------------------------------------
LT     |N and V differ (signed <)                       
--------------------------------------------------------
GT     |Z clear, N and V the same (signed >)            
--------------------------------------------------------
LE     |Z set, N and V differ (signed <=)               
--------------------------------------------------------
AL     |Always execute (the default if none is          
       |specified)                                      
--------------------------------------------------------

Setting the ALU flags in the PSR

Data processing instructions change the state of the ALU's N,Z,C and V status outputs but these are latched in the PSR'S ALU flags only if a special bit (the 'S' bit) is set in the instruction.

Illustration

The following code fragment is extracted from gcd.c, which can be found in the examples directory.

while (a != b)
{ if (a > b) a -= b;
  else       b -= a;
}

Without conditional execution this could be naively coded as:

gcd CMP    a1, a2
    BEQ    end
    BLT    lessthan
    SUB    a1, a1, a2
    B      gcd
lessthan
    SUB    a2, a2, a1
    B      gcd
end 

Conditional execution and selective setting of the PSR'S ALU flags allows it to be coded much more compactly as follows (this version can be found in the examples directory as gcd.s).

gcd CMP    a1, a2
    SUBGT  a1, a1, a2
    SUBLT  a2, a2, a1
    BNE    gcd

Two tricks are illustrated:

Running the C example

You can run the C gcd routine shown above under armsd. To do this first set your current directory to the examples directory.

Compile, link and run the C version of the gcd routine by using the following commands:

armcc -c gcd.c -li -apcs 3/32bit
armcc -c gcdtest.c -li -apcs 3/32bit
armlink -o gcdtest gcd.o gcdtest.o somewhere/armlib.321
armsd -li gcdtest

where somewhere is the directory in which armlib.32l can be found.

Explanation

The two armcc commands compile the gcd function and the test harness, creating relocatable object files gcd.o and gcdtest.o. The -li flag tells armcc to compile for a little-endian memory. The -apcs 3/32bit option tells armcc to use a 32 bit version of the ARM Procedure Call Standard. You can omit these options if your armcc has been configured for this default.

The armlink command links your relocatable objects with the ARM C library to create a runnable program (here called gcdtest).

The armsd command invokes the debugger, with gcdtest as the program to be run. Again -li specifies that little-endian memory is required (as with armasm above).

Running the assembler example

You can run the gcd routine shown above under armsd. To do this first set your current directory to the examples directory.

You can assemble, link and run the assembler gcd routine by using the following commands:

armasm gcd.s -o gcd.o -li
armcc -c gcdtest.c -li -3/32bit
armlink -o gcdtest gcd.o gcdtest.o somewhere/armlib.32l
armsd -li gcdtest

where somewhere is the directory in which armlib.32l can be found.

Explanation

The armasm command assembles the gcd function, creating the relocatable object file gcd.o. The -li flag tells armasm to assemble for a little-endian memory. The -apcs 3/32bit option tells armcc to use a 32 bit version of the ARM Procedure Call Standard. You can omit these options if your armasm has been configured for this default.

The armcc command compiles the test harness. The -c flag tells armcc not to link its output with the C library; the -li flag tells armcc to compile for a little-endian memory (as with armasm).

The armlink command links your relocatable objects with the ARM C library to create a runnable program (here called gcdtest).

The armsd command invokes the debugger, with gcdtest as the program to be run. Again -li specifies that little-endian memory is required (as with armasm above).

Related topics