Dammit, Jim, I’m an Engineer, not an English major!
API documentation is incredibly important, but unlike tutorials or other reference documents that can be written in conventional text editors or word processors with nice built in spell checkers, the public API of a Java project is conveyed through JavaDoc comments that are contained in source code right next to each class and method as they are declared. We necessarily find ourselves with tons of generated web pages covered in prose… and full of spelling mistakes.
So I’ve had it in mind for a while to run
aspell over the java-gnome sources. Raving insanity, that’s for sure. Spell check source code? Are you mad?
Well, being suitable euphoric on New Year’s Day, I decided “what the hell, why not” and gave it a try. One thing that was obvious was that I was going to end up with a ridiculous number of unknown words that were going to need adding. Rather than
filling my personal dictionary (the one in
$HOME) with tons of project specific crud, I used
aspell‘s -p option to specify a new word list in the project’s top level directory. Easy enough with
bzr — they’ve got a
bzr root command that tells you the path of the project root. Nice.
Here’s the command line I used:
aspell -x -c -p `bzr root`/.aspell.en.pws -H Button.java
It worked pretty well. The tokens sure did add up in a hurry, though. Java language keywords? Ok, no problem. Class names? Sure, makes sense to add them — many had already turned up when spell-checking other documentation in the project. But uh oh: it wants to know about
x and every other bit of source code. Yikes. Quite the pain to add all that stuff while working through the files just to get to the JavaDoc and normal comments in order to fix the spelling in the text there.
But worth it… we now have spell checked API documentation!
What would really be neat is to write a little module for
aspell that adds a mode that understands to only spell check stuff between
*/ characters. The
-H flag above tells
aspell to ignore HTML markup, and there are modes for LaTeX and others. So hopefully a “source code” mode would be feasible, and I could start again and have a slightly better signal-to-noise ratio :)
Happy New Year!