Getting a core dump
Sometimes things crash. This is the normal order of things, even if we like to pretend that Linux is so much better than its proprietary competitors. When a native library crashes underneath a java-gnome program, however, this isn’t so much fun, because the actual process which crashed is a Java Virtual Machine.
Usually I see with crashes because of something I’ve done wrong in binding an underlying GNOME library from java-gnome. So I bisect & printf() my way down until I can find the thing that causes it, read the docs, and hopefully figure it out.
Recently, however, I’ve been getting crashes somewhere deep in libpangoft when my the app is first loading or worse just sitting there and I’m not doing anything more onerous than moving the cursor around a TextView. This is still likely due to something I’ve done wrong in my code or in the bindings layer, but when it’s not happening deterministically or on demand it’s hard to even begin to analyze the problem.
The OpenJDK HotSpot VM (formerly the Sun HotSpot VM) has a pretty good SIGSEGV handler; it does its best to show you what the C library call was that died, and what the Java and C call stacks were leading up to the crash. You may have seen them around as hs_err_pid10733.log and such [I wish it would just spew that out to stderr instead of troubling to write a file, but anyway].
Strangely, the only line I’m getting when reading the crash report is:
C [libpangoft2-1.0.so.0+0x17687]
no other stack frames. Which is a bit strange, and decidedly unhelpful. So I’m going to need to try a bit harder to get a backtrace, it seems.
Getting a native C stack trace of a Java program with GDB
Much as I otherwise feel GDB is the most horrendous user interface in the history of civilization (ok, maybe second only to GPG’s command line interface), it does do one thing extraordinarily well and that’s stack backtraces of crashed programs. These days one normally runs one’s program in gdb, induces it to crash, and then runs bt:
$ gdb ./program (gdb) run SIGSEGV caught (gdb) bt
And you get your stack trace.
That’s a bit of a pain with a Java “program” because as mentioned the process running is a Java Virtual Machine (and because invoking java is … almost as bad as GPG’s command line interface):
$ java ... gobbledygook ... package.Class arguments
Usually people put their invocation line in a shell script along with various environment setup and so on:¹
$ ./script arguments
and off they go. The complication is that’s not really easy to use with GDB, since you need to invoke gdb on the binary executable (the JVM) and then, once GDB is finally up, tell it to “run” that executable with a bunch of arguments. Which means you’re back to the gory mess:
$ gdb java (gdb) run ... gobbledygook ... package.Class arguments SIGSEGV caught (gdb) thread apply all bt
which is incredibly tedious for casual use.
But the old fashioned 1980s way of using a debugger is to get a “core dump” of memory into a file called core and to run GDB on that. Just set your shell to core dump, then go back to running your program as normal:
$ ulimit -c unlimited $ ./script arguments Aborted (core dumped) $ gdb java core (gdb) thread apply all bt
and our stack traces will spill out in great gory detail. Hooray!
Incidentally, if you want to play with GDB and see what a HotSpot JVM is up to, then you need to induce it to crash; one way to do this is to send it a signal, say SIGSEGV or SIGBUS:²
$ kill -11 10733
Yeay, Open
The real point here is that with Sun having open sourced Java and it’s HotSpot VM implementation, we can now build Java ourselves and include debugging symbols [on Debian Linux, for example, install package openjdk-6-dbg along with the symbols for the various libraries in the GNOME stack, libgtk2.0-0-dbg and so on]. This means, at long last, we can actually run Java under GDB — something we weren’t able to do when Java was proprietary — and get lovely backtraces when it thunders in.
Yeay for crashes.
AfC
¹ Which is why people put this invocation into a shell script, which makes it even harder to debug because you’ve got to run ps axww or whatever to try and get the full command line used to run the program. {sigh}, but fixing this will have to wait for another day.
² Bernd Eckenfels suggests avoiding SIGSEGV as apparently he believes this is caught in some places and rethrown as NullPointerException. I’ve never observed that, but I thought I’d mention his advice to use SIGBUS instead.
Comments
Josh Triplett wrote mentioning that he likes to use
SIGQUITto get his C applications to core dump, since you can trivially generate that signal from the console withCtrl+. What he didn’t know was that Sun’s Java VM has always had a handler forSIGQUITwhich prints a stack trace for each currently running Java thread (which is useful when trying to debug deadlock issues, but it’s the Java-side call stack only, not the native frames).Mark Wielaard mentioned a nice trick to attach to a crashing hotspot JVM to work around any core file limitations:
java -XX:OnError="gdb - %p" <arguments>He and James Henstridge also note that having the “live, but almost dead program” around sometimes makes things a little easier on the debugger (as opposed to relying on a core dump). James suggests computers are fast, and just running all your [problem child] programs in
gdball the time. Fair enough, although in my case with such a hard to reproduce crash, I think I’ll wait on a core dump.Mark let me know that on Fedora you can get debug symbols for any package you are trying to inspect. If you do
debuginfo-install java-1.6.0-openjdk(in this case) it will pull in every dependent debuginfo package also! He also notes that this crash in question might actually be a Pango problem, and cites this Fedora bug.
