Portability


The C programming language has gained a reputation for being portable across machines, while still providing machine-specific capabilities. However, the fact that a program is written in C gives little indication of the effort required to port it from one machine to another or, indeed, from one C system to another.

Obviously the most effort-consuming task is porting between two entirely different hardware environments, running different operating systems with different compilers. Because many users of the ARM C compiler will face just this situation, this section deals with the issues that the user should be aware of when porting software to or from the ARM C system environment. In outline:

In addition, the tool topcc is supplied as part of the ARM Software Development Toolkit. topcc translates ANSI C to PCC style C.

If code is to be used on a variety of different systems, there are certain issues that should be borne in mind to make porting an easy and relatively error-free process. It is essential to identify practices which may make software system-specific, and to avoid them. In the remainder of this section, we document the general portability issues for C programs.

Fundamental data types

The size of fundamental data types such as char, int, long int, short int and float will depend mainly on the underlying architecture of the machine on which the C program is to run. Compiler writers usually implement these types in a way which is natural for the target. For example, Release 5 of the Microsoft C Compiler for DOS has int, short int and long int, occupying 2, 2 and 4 bytes respectively, while the ARM C Compiler uses 4, 2 and 4 bytes, respectively. Certain relationships are guaranteed by the ANSI C standard (such as sizeof(long int) >= sizeof(short int)), but code which makes any assumptions about whether int and long int have the same size, will not be portable.

A common non-portable assumption is embedded in the use of hexadecimal constant values. For example:

int i = 0xffff;            /*    -1 if sizeof(int) == 2;
                65535 if sizeof(int) == 4... */

In non-ANSI dialects of C there are pitfalls with argument passing. Consider, for example:

int f(x)
long int x;
{...}

and the (careless) invocation of f():

f(1);    /*    f(1L) was intended/required */

If sizeof(int) == sizeof(long int), all will be well; otherwise there may be catastrophe.

A dual problem afflicts the format string of the printf() family, even in ANSI C. For example:

long int l1, l2, l3;
...
printf("L1 = %d, L2 = %d, L3 = %d\n", l1, l2, l3);
    /* "...%ld...%ld...%ld..." is intended/required */

Again, if sizeof(int) != sizeof(long) we have dangerous nonsense.

Another common assumption is about the signedness of characters, especially if chars are expected to be 7-bit quantities rather than 8-bit ones. For example, consider:

static char tr_tab[256] = {...};
...
int i, ch;
...
    i = fgetc(f);   /*should be i = (unsigned char) fgetc(f) */
    ch = tr_tab[i]; /* WRONG if chars are signed... */

Note that declaring i to be unsigned int doesn't help (it merely causes ch = tr_tab[i] to index a very long way off the other end of the array!).

In non-ANSI dialects of C there is no way to explicitly declare a signed char, so plain chars tend to be signed by default (as with the ARM C compiler in -pcc mode). In ANSI C, a char may be plain, signed or unsigned, so a plain char tends to be whatever is most natural for the target (unsigned char on the ARM).

Byte ordering

A highly non-portable feature of many C programs is the implicit or explicit exploitation of byte ordering within a word of store. Such assumptions tend to arise when copying objects word by word (rather than byte by byte), when inputting and outputting binary values, and when extracting bytes from, or inserting bytes into, words using a mixture of shift-and-mask and byte addressing. A contrived example which illustrates the essential pitfalls is:

unsigned a;
char *p = (char *)&a;
unsigned w = AN_ARBITRARY_VALUE;
while (w != 0)                                /* put w in a */
{    *p++ = w;                            /* or, maybe, w byte-reversed... */
    w >>= 8;
}

This code will only work on a machine with 'little-endian' byte order.

The best solution to this class of problems is either to write code which does not rely on byte order, or to have separate code to deal appropriately with the different byte orders.

Store alignment

The only guarantee given in the ANSI C Standard regarding the alignment of members of a struct is that a hole (caused by padding) cannot occur at the beginning of the struct.

The values of holes created by alignment restrictions are undefined, and you should not make assumptions about these values. Strictly, two structures with identical members, each having identical values, will only be found to be equal if field-by-field comparison is used; a byte-by-byte, or word-by-word, comparison need not indicate equality.

In practice, this can be a real problem for both auto structs and structs allocated dynamically using malloc. If byte-by-byte comparability of such structures is required, they must be zeroed using memset() before assigning field values.

Padding may also have implications for the space required by a large array of structs. For example:

#define ARRSIZE 10000
typedef struct
{    int i;
    short s;
} ELEM;
ELEM arr[ARRSIZE];

may require 40KB, 60KB or 80KB depending on the size and alignment of ints and shorts (assume a short occupies 2 bytes, 2-byte aligned; then consider a 2-byte int, a 4-byte int 2-byte aligned, and a 4-byte int 4-byte aligned).

Pointers and pointer arithmetic

A deficiency of the original definition of C, and of its subsequent use, has been the relatively unrestrained conversion between pointers to different data types and integers or longs. Much existing code makes the assumption that a pointer can safely be held in either a long int or an int variable. While such an assumption may indeed be true in many implementations on many machines, it is a highly non-portable feature on which to rely. Furthermore, there is no single arithmetic type which is guaranteed to hold a pointer (long or unsigned long is probably a generally safer guess than int or unsigned int).

The problem is further compounded when taking the difference of two pointers by performing a subtraction. When the difference is large, this approach is full of potential errors. ANSI C defines a type ptrdiff_t, which is capable of reliably storing the result of subtracting two pointer values of the same type; a typical use of this mechanism would be to apply it to pointers into the same array.

Although the difference between any two pointers of similar type may be meaningful in a flat address space, only the difference between two pointers into the same object need be meaningful in a segmented address space.

Finally, there are problems of evaluation order with address arithmetic. Consider:

long int base, offset;
char *p1, *p2;
....
offset = base + (p2 - p1);                    /*intended effect */

Now suppose this latter expression were:

offset = (base + p2) - p1;

In a flat address space without holes the expressions are equivalent. In a segmented address space, (p2 - p1) may well be a valid offset within a segment, whereas (base + p2) may be an invalid address. If, in the second case, the validity is checked before subtracting p1, then the expression will fault. This latter class of problem will be familiar to MS-DOS programmers, but alien to those whose main experience is of Unix.

Function-argument evaluation

Whilst the evaluation of operands to operators as ',' and || is defined to be strictly left-to-right (including all side-effects), the same does not apply to function-argument evaluation. For example, in the function call:

i = 3;
f(i, i++);

it is unclear whether the call is f(3, 3) or f(4, 3).

Of course, it is in general unwise for argument expressions to have side effects, for many reasons.

System-specific code

The direct use of operating system calls is obviously non-portable, though often necessary. Isolating such code in target-specific modules, behind target-independent interfaces, helps.

File names and file-name processing are common sources of non-portability which are often surprisingly painful to deal with. Again, the best approach is to localise all such processing.

Binary data files are inherently non-portable. Often the only solution to this problem may be the use of some portable external representation.