hackergotchi
Tales from a phasmatodean man

Peter Miller is a senior software developer with over 30 years on embedded systems. He has experience on a variety of platforms and is renowned for his focus on effective software testing practises. He was one of the the original authors of the now universally used gettext internationalization infrastructure. Lately he has been writing a library to better “explain” error messages coming from the Linux kernel's various system calls.

Contact...

Google Plus +Peter Miller

Email 0x70405E10

RSS Feed /pmiller

memmove madness

My previous blog post spoke about not second-guessing the compiler and other facilities provided by the system, such as the C Standard library. Prompted, in part, by a real-world example: see if you can spot the bug in the following code…

void *
memmove(const void *src, void *dst, uint32_t len)
{
    const uint8_t *s = (const uint8_t *)src;
    uint8_t *d = (uint8_t *)dst;
    const uint8_t *e = s + len;
    /* (s == d) could go either way */
    if (s < d && d < e)
    {
        d += len;
        s += len;
        while (len--)
            *--d = *--s;
    }
    else
    {
        while (len--)
            *d++ = *s++;
    }
    return dst;
}

Actually, it was a trick question: there are at least 3 bugs in the above code.

1: The first bug, and most egregious, is the first and second arguments: they are reversed.  Anyone linking against this land-mine is going to get their foot blown off, real soon now.

Even if you disagree with the standard about the order of the arguments, and it is easy to see how some one could, do not mess with stuff that will be used internally by numerous libraries, including the uses made of it by what little of the compiler’s libc you are actually using.

Coders, not unreasonably, expect standard conforming behavior from functions defined by the C Standard.

2: The second bug is that the third argument should be size_t, because size_t is not always the same as uint32_t, just think of 64-bit machines (or, 32-bit machines with address spaces of more than 32 bits, like some 40-bit SoC chips).

Fortunately, modern versions of gcc are going to catch the mismatch between the prototype and the definition.  Assuming, of course, the the compiler’s standard conforming <string.h> is being used, and not some home-grown brain-damage.

Coders, not unreasonably, expect standard conforming behavior from functions defined by the C Standard.

3: The third bug is that the sizeof operator returns multiples of char, with the standard guaranteeing only that sizeof(char) == 1.  Look in <limits.h> for the CHAR_BIT symbol definition, it exists for a reason.  When the C standard was first written, 36-bit machines were still in active use; those machines had a 9-bit char type.  No matter how insane or outdated you think the concept is, the function above should not be using uint8_t, is should be using one of the char types.

In addition, if the machine has no integer type of exactly 8 bits, the standard requires that the uint8_t type not be declared by <stdint.h> at all. So the above code will simply fail to compile on such a machine.  Of course, such a machine could be seen as an anachronism, surely no modern machine would have a non-power-of-2 register size?  Well then, consider the case of a machine with 256-bit long long, 128-bit long, 64-bit int, 32-bit short and 16-bit char.  Again, uint8_t would not be declared by the <stdint.h> header.

The correct way to determine the availability of uint8_t at compile time is the presence or absence of the UINT8_MAX preprocessor symbol.

#include <stdint.h>
#ifndef UINT8_MAX
#error "this code is not that portable"
#endif

See also uint8_least_t for a type with at least 8 bits, but not guaranteed exactly 8 bits; it would be defined for both the 36-bit example and also the 256-bit example.

Portability Schmortability

Many years ago, when my pet dinosaur was still alive, I recall the painful transition from C on 16-bit machines, to C on 32-bit machines.  Many lessons from that era were incorporated into the the C89 standard, which happened in almost the same time frame.

Years later, and some folks are once again struggling, this time with the 32-bit to 64-bit transition.  This time, the C standard got there ahead of you, and has a nifty <stdint.h> include file, and a whole bunch of useful standard types.

The embedded code base I am currently working with is riddled with 32-bit assumptions.  First, they decided to define their own <wrong_types.h> (names changed to protect the guilty) with, you guessed it, typedefs and/or #defines for a whole bunch of symbols very nearly identical to the standard types in <stddef.h>, <stdint.h> and <stdbool.h>.  And, in some cases, with incorrect implementations of standard types.  Of course, compiling this code with a native 64-bit compiler, rather than the 32-bit cross compiler, has excrement exploding in all directions.

This article contains several portability lessons that folks seem to have forgotten (or, probably, are too young to have ever learned).

Lesson 0:

Why am I compiling it native when it is embedded code?  Test Driven Development (TDD).  It will save your sanity.

With a native build of (most of) the embedded code, I can put automated tests into the build system, for anything that I can run without having to be on the target, such as (I kid you not) the BCD arithmetic library.  The code download takes ~8 minutes, so running all those tests as part of the build is way faster at spotting regressions.  Also, my desktop build host is an order of magnitude faster than the target, and quad core.

Lesson 1:

Do not create a file called <wrong_types.h> and try to second-guess the C compiler.  You will always and forever be wrong, plus your code is not reusable if you do this.  This is a non-portable highway straight to hell.  Always have a second system for building embedded code, each with different numbers of bits (e.g. 32-bit embedded target and 64-bit build host).  That way assumptions will explode at compile time or automated test time, not in front of a customer.  If you can arrange things so that the two systems you are testing on also have different endian-ness, that’s also good for exposing assumptions.

The idea is for the code to compile correctly on both platforms, with no code changes required. Zero edits.  Build both, every day, with your Continuous Integration (CI) build server.  It is possible, but you have to use the types the compiler provides to tell you how big everything is.

Use <stddef.h>, <stdint.h> and <stdbool.h> instead of your own types header file.  They are standards compliant, portable, and the standard’s authors have thought more deeply about the semantics than you have.  Never heard of them?  Find a copy of the standard and read it, now.  Google for “ansi-c 2011 n1570 pdf”, it should be on the first page.

Oh, and don’t replace libc.a with your own implementation, either.  The stuff in the libc.a provided by the compiler vendor is probably way better than yours, and will also be standards compliant.  (Did you know strcmp is required by the standard to use unsigned chars for the comparison?  Didn’t think so.)  Don’t replace libgcc.a with your own implementation, either.  When (not if) the compiler is upgraded, you don’t want all the bug fixes and security fixes in their libc.a and libgcc.a to pass you by.

Lesson 2:

Always, always, always compile with at least -Werror -Wall -Wextra, and then fix all the problems revealed.  If they aren’t bugs now, they will be when the clueless maintenance guys get done with it.  (Marooned on a proprietary compiler?  Just use GCC in parallel, it’s free.)

Don’t just add more casts to make the warnings go away.  Actually think about what the compiler is telling you, and then fix the underlying flaw in how you are doing things. The third and fourth lessons are some specific instances of this.

“But they’re just warnings.”  Bzzt, wrong.  Spot the defect in the following code:

(volatile long)x = bigness;

Gcc has been warning about this for almost 20 years.  As of Gcc 4.4, this is now a fatal error, and about time, too.  Still can’t figure it out?  The C standard (since 1989, 22+ years ago) says that the output of a cast is never an lvalue (“What’s an lvalue?”  An expression that may appear on the left of an assignment, a subset of all the possible expressions that can appear on the right hand side).  The code should read

*(volatile long *)&x = bigness;

Even this re-written version probably still contains a design flaw. Unless you are talking to a device register, and even then it is questionable.

Lesson 3:

You can’t fit a pointer into a long.

void *p = blah;
long fake_ptr = (long)p;

Use intptr_t or uintptr_t for this job.  Better yet, ask yourself why you are pushing an integer type around, not a pointer type.  This often reveals design flaws, dinosaur sized ones.  BTW, using casts to make the compiler shut up, like this…

void *p = blah;
long fake_ptr = (long)(intptr_t)p;

simply obfuscates the bug.  The compiler is trying to tell you that your conversion will, one day, if not already, lose bits of representation.  When you

p = (void *)fake_ptr;

the pointer that went in isn’t guaranteed to be the pointer that comes out.  “But this will only ever be used on a 32-bit machine.”  Bzzzt, wrong.  I’m still scraping excrement off the walls.

Lesson 4:

You can’t fit what sizeof returns into an int

int len = sizeof(something);

use size_t for this job.  It isn’t guaranteed to fit into a long, either.  And, just to be clear, size_t is guaranteed to be unsigned. Use ssize_t (POSIX) or ptrdiff_t (ANSI C) for the signed variety, although any modern version of GCC is going to give zillions of signed-vs-unsigned warnings, and for good reason: the type promotion rules around these cases are almost certainly not what you think they are.

“What do I care?  Anyone declaring a 2GB object is an idiot.”  Bzzzt, wrong.  Try writing a malloc implementation one day.  Or, worse, reading one that assumes int32_t is big enough to hold the size of the malloc arena without losing address bits and without going negative.

Lesson 5:

(I can’t believe this one is still with us.)  Do not use sprintf or vsprintf, ever, for anything.  Always, always, always use snprintf or vsnprintf.  This usually means fixing all of your APIs so that whenever a string/buffer pointer argument is passed, it is immediately followed by a size_t argument that is the size of the aforementioned string/buffer.  (If your compiler vendor is 22 years late in providing a standards compliant snprintf function, upgrade; to Linux if necessary, it’s free.)

Also, if you are tempted to write your own strncpy that doesn’t have the ugly strncpy semantics (look it up), don’t.  Use snprintf instead (take your time, think it through, it wasn’t obvious to me, either).

Lesson 6:

Use “man gcc” and read what -Wshadow does.  Then turn it on.  And then, for any significantly old’n‘large code base, be horrified at what it tells you.  The -Werror option is your friend.

Here endeth the lesson.  (For today, anyway.)

Transparent is a pane

My latest adventure into Gtkmm-3, http://canola.sourceforge.net/, left me wanting to animate a series of images, and I wanted to use a transparent borderless transient window, and move the images around within it.

Strangely, this proved to be in the “far too hard” category.  First, you need a compositing window manager.  Second, you have to ignore all of the Gnome-2 answers on the Internet.

First: A compositing window manager.  I’ve been using Ubuntu Oneiric since November, but with Gnome Classic installed.  I thought I’d opted for the special effects, but no.  The instructions for turning on Compiz are mostly OK, except for the annoying default (set by the Ubuntu folks) of Compiz having that useless Unity icon box thing on the left hand side of the screen.  Edit
.config/compiz-1/compizconfig/config to say

[gnome_session]
profile =

Second: Gnome-3 isn’t Gnome-2.

The Internet has many places that describe hacks to goose Gtk+ into having a transparent window background, with opaque widgets, but none of them work on Gmome-3, usually because the named functiosn no longer exist.

{{ I should mention, the code that follows is in a class derived from Gtk::Window, which is why there are calls like get_screen() that wraps the gtk_widget_get_screen but appear to be missing an argument, the implicit C++ this pointer. }}

The first thing I tried was Gtk::Window::set_opacity(0.).  The entire window disappears, widgets and all.  This method makes the window and its contents transparent, rather than just the window background.

A bunch of places say to use

set_rgb_colormap(get_rgba_colormap());

except that these don’t exist in Gtkmm-3.  What is needed is to set the visual, which actually makes more sense:

Glib::RefPtr<Gdk::Visual> vp = get_screen()->get_rgba_visual();
if (vp)
{
gtk_widget_set_visual(Gtk::Widget::gobj(), vp->gobj());
}

Hmmm, a naked gtk_ call, but necessary because Gtkmm has not wrapped the necessary Gtk+ function.  How exactly this helps is that you can now set the background to transparent:

Gdk::RGBA sep;
sep.set_rgba(1., 1., 1., 0.);
get_window()->set_background(sep);

This fragment of Gtkmm code sets the Gdk window’s background.  It isn’t the same as Gtk::Winow’s background (an abstraction I have only begun to start  to fathom).  But you can’t do this until the window has been realised (the first show() call), or get_window() will return the NULL pointer.

But wait, there’s more!  The above is not sufficient, and you still get an opaque white window background.  The magic call is this one:

set_app_paintable(true);

and you do this before setting the background color.  This tells Gtk+ not to paint the window background itself, meaning that it will leave alone our setting of the transparent window background.

Success: a window with a transparent background and opaque widgets moving around inside it.

Note that there is a race: depending on where the code executes during the screen refresh cycle, there is a risk that the user will see a flicker as the window is shown opaque, and then becomes transparent.  I minimsed this by not showing any of the child widgets, then set the background, then show the child widgets.

to inline or not to inline

I was recently asked why so few inline methods appear in my open source C++ code, such as SRecord or Tardy. There are two main reasons.

The first reason is that by using the inline keyword, you are second-guessing the compiler, a form of premature optimization. See the Quotes section of Program optimization on Wikipedia for some on-point and very pithy summaries of why this is bad. If I haven’t profiled it, I wont optimize it. Humans are bad at guessing where optimization is beneficial, including me.

The second reason is that methods declared both inline and virtual are almost always pointless, because the compiler must “outline” the method so that it can place a pointer to it in the class vtable. There are very, very rare cases where the compiler can legally inline a virtual method, and you usually have to do it on purpose. I recall one embedded project where a single declaration was preceded by a 50 line comment, explaining why it was necessary, how it resulted in particular virtual inline methods being inlined, and why the performance benefit was justified.

Tagged |

Material on this site copyright © 2002-2012 Operational Dynamics Consulting, Pty Ltd unless otherwise noted. All rights reserved. Not for redistribution or attribution without permission in writing.

We make this service available to our staff in order to promote the discourse of ideas especially as relates to the development of Open Source worldwide. Blog entries on this site, however, are the musings of the authors as individuals and do not represent the views of Operational Dynamics. All times UTC.