The code generator cometh

Calling C libraries from Java is somewhat arcane. It’s not so much that it’s “hard” as it is laborious. Once you’ve got your mind around it, however, and put the build infrastructure in place into your project (admittedly nasty learning curves), it’s not such a big deal to add another call or three on the margin. But for a library like java-gnome, which aspires to present a vast native libraries like GTK, each incremental function means an enormous amount of mundane and repetitive work.

Old way

In the old java-gnome, this Java and C code was hand written for every single method. Needless to day half the bugs that were in there were the kinds of errors that arise from madly doing so much cutting and pasting that your brain shuts off. Adding new coverage was insanely painful. No kidding it was abandoned.

New way

The primary motivation to re-engineer the Java bindings for GTK and GNOME was to switch from having to hand write all this glue code to an architecture whereby we could generate these complicated middle layers.

I released original prototype in November after 6 months of design work, and have been solidifying it in the java-gnome 4.0 releases so far. Obviously we had to mock up the parts that will be generated to make sure that we had the architecture right, which was no fun, because that’s right back to hand writing all the hideous boring shit in the middle. I’ve been pretty happy with it, though. The overall design has done a good job of handling unexpected corner cases as they’ve come up, and that’s always a good sign. Best of all has been the positive feedback from people who have looked at the code and complimented the design (we even had one guy come into #java-gnome the other day and say “java-gnome is really sexy”. Nice!).

But hand writing the glue code necessary to expand coverage sucks, especially when the whole point was to be generating all the boring bits. As you can see from this overview slide of the java-gnome 4.0 architecture,

java-gnome 4.0 architecture slide

it’s up to the code generator to do the bulk of the work. We needed to figure out the detailed design first, obviously, but clearly the next job is to get the machinery to output the necessary Java (code to translate from our public API into types we can safely ship across the Java/C boundary, and the native method declarations) and the corresponding C (the JNI code necessary to cast parameters and then actual call to the underlying library) … and of course the reverse direction for return values and callbacks like signals.

About 3 months ago we got started on it.

The codegen branch

The code generator has been evolving slowly but steadily. It started life as a little spike I wrote last September in Perl. That wasn’t going to scale, so Srichand Pendyala ported it to Python. He and I worked out some design issues there, but then we realized that that wasn’t going to scale either, so I ported it to Java. It’s been going gang busters since then, and we’ve since had lots of help from Vreixo Formoso and new contributor Sebastian Mancke. It’s getting to be quite the beasty.

The full stream of .defs data is massive; obviously the goal we’re pushing towards is to reach the point where the code generator can parse all the (define...) blocks therein and then output the appropriate Java and C code to implement each block.

So many corner cases. First time I tried it on the full data set it was not pretty :|. But we’ve been working away at it, and making painful but steady progress. Eventually I adopted the expediency of just blacklisting types that we didn’t know what to do with (an out-parameter that returns an array of arrays of GObject pointers? Fuck off!) — and finally last week I got a run that completed generation over the set of .defs files that represent GTK. That was an outstanding moment.

As of this writing, the java-gnome code generator outputs a translation layer of 475 classes containing 5454 methods (with 405 blacklisted due to requiring types that we haven’t figured out what to do with yet) and 733 callbacks (signal handler prototypes, mostly). There are still 162 odd compiler errors to go (some trivial, some really nasty), but we’re getting there!

Public API

This, of course, is just the glue in the middle. But this success means we are getting closer to the point where humans can get down to the serious work of writing the wrapper code and JavaDoc necessary to present those methods and signals we wish to expose in our public API.

One of the less appealing things about GTK in C is that numerous internals are unavoidably visible. In a language binding like java-gnome we don’t have to expose any of that — indeed, one of our primary design criteria has been zero such leakage. Thus there are a not insignificant number of entities in the .defs data that we don’t need to expose. GTK is huge, however, and it will take a long time to get our coverage up to a respectable level.

Where are we now? 58 public methods, 0.01%. That’d be radically embarrassing to admit, except that we gotta start somewhere. But it’s onwards and upwards from here — and although there are still quite a number of engineering design issues that have yet to be tackled, I don’t have to hand write JNI code any more, and neither do you.

The code generator cometh.

AfC