Multi-threaded GTK applications – Part 1: Misconceptions

I think my favourite message of the last year has to be something Tristan wrote:

“it is really plain and simple thread programming, you just have to really know what you’re doing.”

One of the most frequently asked questions in the GTK mailing lists is “How do I use threads with GTK”. Actually, the question is more often “I’ve started writing a GTK program, and it doesn’t work. What the hell?” and it quickly emerges that they’re trying to do a multi-threaded GTK app, and doing it wrong.

The conventional answer has long been that “you must only make GTK library calls from the main thread” (that is, the thread that executed gtk_main() and is therefore the one running the GTK main loop). What usually follows is advice to create a producer-consumer relationship to dispatch requests from the worker threads to the main [loop] thread and have the code to actually do the UI updates in separate code there. Frankly, this is rather cumbersome when all you want to do is a quick worker thread so you don’t freeze the UI by blocking the main loop, and then a couple of quick (but very case specific) UI updates to report the result. Non-trivial validation and then a database commit upon pressing “OK” is a classic example.

So, I’ve been studying this carefully for the last several months. Searching the GNOME archives turns up any number of fascinating threads on the topic. (ooof, bad pun, sorry) and there is frequent reference across GTK documentation and tutorials about the necessity to guard all GTK calls.

And yet: despite all the traffic and opinion that you must only make GTK calls from the main thread, this is not actually correct. At least I don’t think it is. Perhaps you can tell me if my conclusions are correct.

As you dig further, you start to realize that the requirement is not that you make calls from the main thread only, but that all GTK calls must be made within the main lock (the GDK lock actually). This is stated quite clearly on the canonical GDK Threads page which cites:

GTK+ is “thread aware” but not thread safe — it provides a global lock controlled by gdk_threads_enter() and gdk_threads_leave() which protects all use of GTK+. That is, only one thread can use GTK+ at any given time.

You’ll note that it doesn’t say anything about the “main loop thread”. And this is where the confusion sets in.

Convenience APIs shield you from the truth

People get the message that they have to put gdk_threads_enter()/leave() around their GTK calls from other threads pretty quickly. But two things tend to be overlooked:

It turns out that you’re supposed to put gdk_threads_enter()/leave() around the call you make to gtk_main(). I suppose that this is implicit in the injunction to protect all GTK calls, but I know I’m not the only one to have had the impression that the main loop was special somehow. It certainly isn’t explicit on the GDK threads API page, and although it turns up (sort of) in the code examples there, the number of people that get this wrong is staggering. So I think we’ll need to improve that documentation a bit. (I’ll submit a patch depending on the feedback I get after this series of posts)

So now with the lightest of refactoring, C side we have this:

int main(int argc, char** argv) {
    g_threads_init()
    gdk_threads_init()
    gtk_init()
    ...
    gdk_threads_enter()
    gtk_main()
    gdk_threads_leave()
}

The second problem is more subtle. Because signal callbacks happen from within the main loop (blocking it), and since the thread running the main loop has traditionally been though of as “special”, people quickly forget that they need to protect the GTK calls in their callbacks, especially if the bulk of their GTK programming has been single threaded C code. This is dramatically exacerbated by the fact that if you properly did gtk_threads_enter() before calling gtk_main(), then when you get a callback you are already in the GDK lock and you don’t even have to think about it. So people don’t.

This is a massive convenience, of course. GTK programming in C is already ridiculously verbose, and if people had to manually surround every individual GTK call with the enter()/leave() functions like this:

void clicked_cb(GtkButton* button, gpointer data) {
    gdk_threads_enter();
    gtk_widget_set_sensitive(button, TRUE);
    gdk_threads_leave();

    gdk_threads_enter();
    gtk_button_set_label(GTK_BUTTON(button), "Blah");
    gdk_threads_leave();
    ...
}

they’d go slightly crazy. Even having to do it at the beginning and end of a callback function:

void clicked_cb(GtkButton* button, gpointer data) {
    gdk_threads_enter();
    gtk_widget_set_sensitive(button, TRUE);
    gtk_button_set_label(GTK_BUTTON(button), "Blah");
    ...
    gdk_threads_leave();
}

would still be way too cumbersome. But because GTK callbacks are within the main loop and from the main thread, no one ever has to think much about it, because we all just do this:

void clicked_cb(GtkButton* button, gpointer data) {
    gtk_widget_set_sensitive(button, TRUE);
    gtk_button_set_label(GTK_BUTTON(button), "Blah");
    ...
}

In fact, the first two examples were wrong because the default implementation of the GDK lock functions is a mutex that is not reentrant, and that would have deadlocked! So not only do we not have to think about the thread safety issue, C side we can’t think about it. Bad.

But wait a minute. If the main loop is in the lock, how can any other thread run?

This is the question that finally got me on the right track. How can any other thread ever execute a GTK call?

It took me overriding the GDK lock grab and release functions with gdk_threads_set_lock_functions() and sticking in some g_debug()s to find this out, but there’s a little detail that no one tells you: the GTK main loop releases its lock as it cycles, and it turns out that GTK itself is properly written to call the enter() and release() functions (via the macros GDK_THREADS_ENTER() and GDK_THREADS_LEAVE()).

So GTK is indeed thread-aware, if not quite thread-safe without a little help from the programmer.

Thus you don’t need to go through the nightmare of setting up a dispatcher mechanism just to make changes to the UI. From any other thread, you just surround your GTK calls with the grab and release functions, and you’re on your way.

In hindsight, all this information was out there; I just came across this which, now that I understand it, clearly says what the main loop is up to. Still, though, there’s a lot of context. As Tristan said, you have to know what you’re doing.

Implications

There’s a shining possibility lurking within all this: what happens if you replace the simplistic GDK lock with a reentrant (aka recursive) one? You’re not going to believe this, but it works, and it works well. And the implication for a language binding like java-gnome is extraordinary: by combining a reentrant monitor with something that automatically locks all the GTK calls, it looks like our Java bindings might be able to be thread safe! That would really be something.

I’ll show you what I’ve found tomorrow.

AfC