Integer to string conversion


About this recipe

This recipe shows you:

This recipe refers to the program utoa1.s in the examples directory. Its dtoa entry point converts a signed integer to a string of decimal digits (possibly with a leading '-''); its utoa entry point converts an unsigned integer to a string of decimal digits.

The algorithm

To convert a signed integer to a decimal string: generate a '-' and negate the number if it is negative; then convert the remaining unsigned value.

To convert a given unsigned integer to a decimal string, divide it by 10, yielding a quotient and a remainder. The remainder is in the range 0-9 and is used to create the last digit of the decimal representation. If the quotient is non-zero it is dealt with in the same way as the original number, creating the leading digits of the decimal representation; otherwise the process has finished.

The implementation

The implementation of utoa sketched below employs the register naming and usage conventions of the ARM Procedure Call Standard: a1-a4 are argument or scratch registers and a1 is the function result register; v1-v5 are 'variable' registers (preserved across function calls); sp is the stack pointer; at routine entry, lr holds the subroutine call return address; and pc is the program counter.

utoa
STMFD  sp!, {v1, v2, lr}                ;function entry - save some v-registers
                                        ;and the return address.
  MOV    v1, a1                         ; preserve arguments over following
  MOV    v2, a2                         ; function calls

  MOV    a1, a2
  BL     udiv10                         ; a1 = a1 / 10

  SUB    v2, v2, a1, LSL #3             ; number - 8*quotient
  SUB    v2, v2, a1, LSL #1             ;  - 2*quotient = remainder

  CMP    a1, #0                         ; quotient non-zero?
  MOVNE  a2, a1                         ; quotient to a2...
  MOV    a1, v1                         ; buffer pointer unconditionally to a1
  BLNE   utoa                           ; conditional recursive call to utoa

  ADD    v2, v2, #'0'                   ; final digit
  STRB   v2, [a1], #1                   ; store digit at end of buffer

  LDMFD  sp!, {v1, v2, pc}              ; function exit - restore and return

Explanation

On entry, a2 contains the unsigned integer to be converted and a1 addresses a buffer to hold the character representation of it.

On exit, a1 points immediately after the last digit written.

Both the buffer pointer and the original number have to be saved across the call to udiv10. This could be done by saving the values to memory. However, it turns out to be more efficient to use two 'variable' registers, v1 and v2 (which, in turn, have to be saved to memory).

(An instructive exercise for the reader is to rework this example with a1 and a2 saved to memory in the initial STMFD, rather than v1 and v2).

Because utoa calls other functions (udiv10 and utoa) it must save its return link address passed in lr. The function therefore begins by stacking v1, v2 and lr using STMFD sp!, {v1,v2,lr}.

In the next block of code, a1 and a2 are saved (across the call to udiv10) in v1 and v2 respectively and the given number (a2) is moved to the first argument register (a1) before calling udiv10 with a BL (Branch with Link) instruction.

On return from udiv10, 10 times the quotient is subtracted from the original number (preserved in v2) by two SUB instructions. The remainder (in v2) is ready to be converted to character form (by adding ASCII '0') and to be stored into the output buffer.

But first, utoa has to be called to convert the quotient, unless that is zero. The next four instructions do this, comparing the quotient (in a1) with 0, moving the quotient to the second argument register (a2) if not zero, moving the buffer pointer to the first argument/result register (a1), and calling utoa if the quotient is not zero.

Note that the buffer pointer is moved to a1 unconditionally: if utoa is called recursively then a1 will be updated, but it will still identify the next free buffer location; if utoa is not called recursively, the next free buffer location is still needed in a1 by the following code which plants the remainder digit and returns the updated buffer location (via a1).

The remainder (in a2) is converted to character form by adding '0' and is then stored in the location addressed by a1. A post-incrementing STRB is used which stores the character and increments the buffer pointer in a single instruction, leaving the result value in the result register a1.

Finally, the function is exited by restoring the saved values of v1 and v2 from the stack, loading the stacked link address into pc and popping the stack using a single multiple load instruction LDMFD sp!, {v1,v2,pc}.

Creating a runnable example

You can run the utoa routine described here under armsd. To do this, you must assemble the example and the udiv10 function, compile a simple test harness written in C, and link the resulting objects together to create a runnable program.

Begin by setting your current directory to the examples directory then use the following commands:

armasm utoa1.s -o utoa1.o -li
armasm udiv10.s -o udiv10.o -li 
armcc -c utoatest.c -apcs 3/32bit
armlink -o utoatest utoa1.o udiv10.o utoatest.o somewhere/armlib.321

where somewhere is the directory in which armlib.32l can be found.

Explanation

The first two armasm commands assemble the utoa function and the udiv10 function, creating relocatable object files utoa1.o and udiv10.o. The -li flag tells armasm to assemble for a little-endian memory. You can omit this flag if your armasm has been configured for this default.

The armcc command compiles the test harness. The -c flag tells armcc not to link its output with the C library; the -li flag tells armcc to compile for a little-endian memory (as with armasm).

The armlink command links your three relocatable objects with the ARM C library to create a runnable program (here called utoatest).

If you have installed your ARM development tools in a standard way then you could use the following shorter command to do the compilation and linking:

armcc utoatest.c utoa1.o udiv10.o -apcs 3/32bit -li

Running the example

You can run your example program under armsd using:

armsd -li utoatest

Note that the -li and -apcs 3/32bit options can be omitted if the tools have been configured appropriately.

Stacks in assembly language

In this example, three words are pushed on to the stack on entry to utoa and popped off again on exit. By convention, ARM software uses r13, usually called sp, as a stack pointer pointing to the last-used word of a downward growing stack (a so-called 'full, descending' stack). However, this is only a convention and the ARM instruction set supports equally all four stacking possibilities: {full or empty} x {ascending or descending}.

The instruction used to push values on the stack was:

STMFD  sp!, {v1, v2, lr}

The action of this instruction is as follows:

The matching pop instruction was:

LDMFD  sp!, {v1, v2, pc}

Its action is:

Discussion

Many, if not most, register-save requirements in simple assembly language programs can be met using this approach to stacks.

A more complete treatment of run-time stacks requires a discussion of:

In the utoa program, you must assume the stack is big enough to deal with the maximum depth of recursion, as no one bothers to check this. In practice, this assumption is OK. The biggest 32-bit unsigned integer is about four billion, or ten decimal digits. This means that at most 10 x 3 registers = 120 bytes have to be stacked. Because the ARM Procedure Call Standard (APCS) guarantees that there are at least 256 bytes of stack available when a function is called and because we can guess (or know) that udiv10 uses no stack space, we can be confident that utoa is quite safe if called by an APCS-conforming caller such as a compiled-C test harness.

This discussion raises another delicacy. The stacking technique illustrated here conforms to the ARM Procedure Call Standard only if the function using it makes no function calls. utoa calls both udiv10 and itself; it really ought to establish a proper stack frame (see ARM Procedure Call Standard). If you really want to be safe and write functions that can 'plug and play together' you have to follow the APCS exactly.

However, when writing a whole program in assembly language you often know much more than when writing a program fragment for general, robust service. This allows you to gently break the APCS in the following way:

So the utoa example is compatible with the APCS even though it doesn't conform to the APCS.

Note however: if you call any function whose stack use is unknown (but which is believed to be APCS conforming), you court disaster unless you establish a proper APCS call frame and perform APCS stack limit checking on function entry. Please refer to ARM Procedure Call Standard for further details.

Related topics

For more information about stacks, and conforming to the ARM Procedure Call Standard see: