memmove madness

My previous blog post spoke about not second-guessing the compiler and other facilities provided by the system, such as the C Standard library. Prompted, in part, by a real-world example: see if you can spot the bug in the following code…

void *
memmove(const void *src, void *dst, uint32_t len)
{
    const uint8_t *s = (const uint8_t *)src;
    uint8_t *d = (uint8_t *)dst;
    const uint8_t *e = s + len;
    /* (s == d) could go either way */
    if (s < d && d < e)
    {
        d += len;
        s += len;
        while (len--)
            *--d = *--s;
    }
    else
    {
        while (len--)
            *d++ = *s++;
    }
    return dst;
}

Actually, it was a trick question: there are at least 3 bugs in the above code.

1: The first bug, and most egregious, is the first and second arguments: they are reversed.  Anyone linking against this land-mine is going to get their foot blown off, real soon now.

Even if you disagree with the standard about the order of the arguments, and it is easy to see how some one could, do not mess with stuff that will be used internally by numerous libraries, including the uses made of it by what little of the compiler’s libc you are actually using.

Coders, not unreasonably, expect standard conforming behavior from functions defined by the C Standard.

2: The second bug is that the third argument should be size_t, because size_t is not always the same as uint32_t, just think of 64-bit machines (or, 32-bit machines with address spaces of more than 32 bits, like some 40-bit SoC chips).

Fortunately, modern versions of gcc are going to catch the mismatch between the prototype and the definition.  Assuming, of course, the the compiler’s standard conforming <string.h> is being used, and not some home-grown brain-damage.

Coders, not unreasonably, expect standard conforming behavior from functions defined by the C Standard.

3: The third bug is that the sizeof operator returns multiples of char, with the standard guaranteeing only that sizeof(char) == 1.  Look in <limits.h> for the CHAR_BIT symbol definition, it exists for a reason.  When the C standard was first written, 36-bit machines were still in active use; those machines had a 9-bit char type.  No matter how insane or outdated you think the concept is, the function above should not be using uint8_t, is should be using one of the char types.

In addition, if the machine has no integer type of exactly 8 bits, the standard requires that the uint8_t type not be declared by <stdint.h> at all. So the above code will simply fail to compile on such a machine.  Of course, such a machine could be seen as an anachronism, surely no modern machine would have a non-power-of-2 register size?  Well then, consider the case of a machine with 256-bit long long, 128-bit long, 64-bit int, 32-bit short and 16-bit char.  Again, uint8_t would not be declared by the <stdint.h> header.

The correct way to determine the availability of uint8_t at compile time is the presence or absence of the UINT8_MAX preprocessor symbol.

#include <stdint.h>
#ifndef UINT8_MAX
#error "this code is not that portable"
#endif

See also uint8_least_t for a type with at least 8 bits, but not guaranteed exactly 8 bits; it would be defined for both the 36-bit example and also the 256-bit example.