Inaugurating the Haskell Sessions

There’s an interesting spectrum of events in the technical space. Conferences are the mainstay obviously; usually very intense and high-calibre thanks to the hard work of papers committees and of course the presenters themselves. You become invigorated hearing the experiences and results of other people, sharing ideas in the hallway, and of course the opportunity to vehemently explain why vi is better than emacs over drinks in the bar later is essential to the progress of science.

For a given community, though, conferences are relatively infrequent; often only once a year for a given country (, Australia’s annual Linux conference, say) and sometimes only once a year globally (ICFP, the international functional programming conference with numerous collocated symposiums and events taking advantage of the fact it’s the one event everyone turns up at is a notable example in computing).

More locally, those in major cities are able to organize monthly meetups, networking events, user groups, and the like. Which are fun; lovely to see friends and continue to build relationships with people you’d otherwise only see once a year.

Finally there are hackfests, often on the order of a weekend in duration. The tend to draw people in from a wide area, and sometimes are in an unusual milieu; Peter Miller’s CodeCon camping and hacking weekends are infamous in the Sydney Linux community; rent a small quiet generator, find a state forest, set up tents and hack on whatever code you want to for a few days. Blissful.

The local monthly events are the most common, though. Typically two or three people offer to give presentations to an audience of 30-50 people. And while hearing talks on a range of topics is invaluable, the fact that so many smart people are in the room passively seems a lost opportunity.

For a while now I’ve been musing whether perhaps there is something between meetups and hackfests. Wouldn’t it be cool to get a bunch of people together, put a problem on the board, and then for an hour go around the room and have a debate about whether the problem is even the right question to be asking, and different approaches to tackling the issue? Something short, relatively focused, and pragmatic; rather than being a presentation of results a consideration of approaches. If we do it in a bit of rotation, each time one person being tasked with framing the question, then over time participants each have the benefit of bringing the collective firepower of the group to bear on one of the problems they’re working.

Needs a name. Seminar? No, that’s what university departments do. Symposium? Too grand. Something more informal, impromptu, but organized. You know, like a jazz jam session. Ah, there we go: gonna call these sessions.

It might be nice to complement the monthly functional programming meetup (fp-syd) that happens in Sydney with something a little more interactive and Haskell focused. And as I’m looking to improve the depth of experience with Haskell in the Engineering group at Anchor, this seemed like a nice fit. So we’re holding the first of the Haskell Sessions tomorrow 2pm, at Anchor’s office in Sydney.

Here’s one to start us off:

Industrial code has to interact with the outside world, often on external dependencies such as databases, network services, or even filesystem operations. We’re used to the ability to separate pure code from that with side-effects, but what abstractions can we use to isolate the dependent code from the rest of the program logic?

I know we’re not the first ones to blunder in to this; I’ve heard plenty of people discussing it. So I’m going to hand around a single page with the type signatures of the functions involved at various points, freshly sharpen some whiteboard markers, and we’ll see what ideas come to light!

If you’re interested in coming, drop me a line.


http-streams 0.5.0 released

I’ve done some internal work on my http-streams package. Quite a number of bug fixes, which I’m pleased about, but two significant qualitative improvements as well.

First we have rewritten the “chunked” transfer encoding logic. The existing code would accept chunks from the server, and feed them as received up to the user. The problem with this is the server is the one deciding the chunk size, and that means you can end up being handed multi-megabyte ByteStrings. Not exactly streaming I/O. So I’ve hacked that logic so that it yield‘s bites of maximum 32 kB until it has iterated through the supplied chunk, then moves on to the next. Slight increase in code complexity internally, but much smoother streaming behaviour for people using the library.

Secondly I’ve brought in the highly tuned HTTP header parsing code from Gregory Collins’s new snap-server. Our parser was already pretty fast, but this gave us a 13% performance improvement. Nice.

We changed the types in the openConnection functions; Hostname and Port are ByteString and Word16 now, so there’s an API version bump to 0.5.0. Literals will continue to work so most people shouldn’t be affected.


http-streams 0.4.0 released

Quick update to http-streams, making a requested API change to the signature of the buildRequest function as well as pushing out some bug fixes and performance improvements.

You no longer need to pass the Connection object when composing a Request, meaning you can prepare it before opening the connection to the target web server. The required HTTP 1.1 Host: header is added when sendRequest is called, when the request is written to the server. If you need to see the value of the Host: field that will be sent (ie when debugging) you can call the getHostname function.

I’ve added an “API Change Log” to the README file on GitHub, and the blog post introducing http-streams has been updated reflect the signature change.

Thanks to Jeffrey Chu for his contributions and to Gregory Collins for his advice on performance improvement; this second release is about 9% faster than the original.


An HTTP client in Haskell using io-streams

An HTTP client

I’m pleased to announce http-streams, an HTTP client library for Haskell, using the Snap Framework’s new io-streams library to handle the streaming I/O.

Back and there again

I’ve been doing a lot of work lately using Haskell to do reprocessing of data from various back-end web services and then presenting fragments of that information in the specific form needed to drive client-side visualizations. Nothing unusual about that; on one edge of your program you have a web server and on the other you’re making make onward calls to further servers. Another project includes meshes of agents talking to other agents; again, nothing extreme; just a server daemon responding to requests and in turn making its own requests of others. Fairly common in any environment build on (and in turn offering) RESTful APIs.

I’m doing my HTTP server work with the fantastic Snap Framework; it’s a lightweight and decently performing web server library with a nice API. To go with that I needed an web client library, but there the choices are less inspiring.

Those working in Yesod have the powerful http-conduit package, but I didn’t find it all that easy to work with. I soon found myself writing a wrapper around it just so I could use types and an API that made more sense to me.

Because I was happily writing web apps using Snap, I thought it would be cool to write a client library that would use the same types. After much discussion with Gregory Collins and others, it became clear that trying to reuse the Request and Response types from snap-core wasn’t going to be possible. But there was a significant amount of code in Snap’s test suite, notably almost an entire HTTP client implementation. Having used Snap.Test to build test code for some of my own web service APIs, I knew there was some useful material there, and that gave me a useful starting point.

Streaming I/O

One of the exciting things about Haskell is the collaborative way that boundaries are pushed. From the beginnings in iteratee and enumerator, the development of streaming I/O libraries such as conduit and pipes has been phenomenal.

The Snap web server made heavy use of the original iteratee/enumerator paradigm; when I talked to some of the contributors in #snapframework about whether they were planning to upgrade to one of the newer streaming I/O libraries, I discovered from Greg that he and Gabriel were quietly working on a re-write of the internals of the server, based on their experiences doing heavy I/O in production.

This new library is io-streams, aimed at being a pragmatic implementation of some of the impressive theoretical work from the other streaming libraries. io-streams‘s design makes the assumption that you’re working in … IO, which seems to have allowed them to make some significant optimizations. The API is really clean, and my early benchmarks were promising indeed.

That was when I realized that being compatible with Snap was less about the Request and Response types and far more about being able to smoothly pass through request and response bodies — in other words, tightly integrating with the streaming I/O library used to power the web server.

http-streams, then, is an HTTP client library built to leverage and in turn expose an API based on the capabilities of io-streams.

A simple example

We’ll make a GET request of (which is just a tinsy web app which returns the current UTC time). The basic http-streams API is pretty straight forward:

11 {-# LANGUAGE OverloadedStrings #-}
24 import System.IO.Streams (InputStream, OutputStream, stdout)
25 import qualified System.IO.Streams as Streams
26 import Network.Http.Client
28 main :: IO ()
29 main = do
30     c <- openConnection "" 58080
32     q <- buildRequest $ do
33         http GET "/time"
34         setAccept "text/plain"
36     sendRequest c q emptyBody
38     receiveResponse c (\p i -> do
39         Streams.connect i stdout)
41     closeConnection c

which results in

Sun 24 Feb 13, 11:57:10.765Z

Open connection

Going through that in a bit more detail, given that single import and some code running in IO, we start by opening a connection to the appropriate host and port:

30     c <- openConnection "" 58080

Create Request object

Then you can build up the request you need:

32     q <- buildRequest $ do
33         http GET "/time"
34         setAccept "text/plain"

that happens in a nice little state monad called RequestBuilder with a number of simple functions to set various headers.

Having built the Request object we can have a look at what the outbound request would look like over the wire, if you’re interested. Doing:

35     putStr $ show q

would have printed out:

GET /time HTTP/1.1
User-Agent: http-streams/0.3.0
Accept-Encoding: gzip
Accept: text/plain

Send request

Making the request is a simple call to sendRequest. It takes the Connection, a Request object, and function of type

   (OutputStream Builder -> IO α)

which is where we start seeing the System.IO.Streams types from io-streams. If you’re doing a PUT or POST you write a function where you are handed the OutputStream and can write whatever content you want to it. Here, however, we’re just doing a normal GET request which by definition has no request body so we can use emptyBody, a predefined function of that type which simply returns without sending any body content. So:

36     sendRequest c q emptyBody

gets us what we want. If we were doing a PUT or POST with a request body, we’d write to the OutputStream in our body function. It’s an OutputStream of Builders as a fairly significant optimization; the library will end up chunking and sending over an underlying OutputStream ByteString which is wrapped around the socket, but building up the ByteString(s) first in a Builder reduces allocation overhead when smacking together all the small strings that the request headers are composed of; taken together it often means requests will be done in a single sendto(2) system call.

Read response

To read the reply from the server you make a call to receiveResponse. Like sendRequest, you pass the Connection and a function to handle the entity body, this time one which will read the response bytes. It’s type is

   (Response -> InputStream ByteString -> IO β)

This is where things get interesting. We can use the Response object to find out the status code of the response, read various headers, and deal with the reply accordingly. Perhaps all we care about is the status code:

42 statusHandler :: Response -> InputStream ByteString -> IO ()
43 statusHandler p i = do
44     case getStatusCode p of
45         200 -> return ()
46         _   -> error "Bad server!"

The response body it available through the InputStream, which is where we take advantage of the streaming I/O coming down from the server. For instance, if you didn’t trust the server’s Content-Length header and wanted to count the length of the response yourself:

42 countHandler :: Response -> InputStream ByteString -> IO Int
43 countHandler p i1 = do
44     go 0 i1
45   where
46     go !acc i = do
47         xm <- i
48         case xm of
49             Just x  -> go (acc + S.length x) i
50             Nothing -> return acc

Ok, that’s pretty contrived, but it shows the basic idea: when you read from an InputStream a it’s a sequence of Maybe a; when you get Nothing the input is finished. Realistic usage of io-streams is a bit more idiomatic; the library offers a large range of functions for manipulating streams, many of which are wrappers to build up more refined streams from lower-level raw ones. In this case, we could do the counting trick using countInput which gives you an action to tell you how many bytes it saw:

42 countHandler2 p i1 = do
43     (i2, getCount) <- Streams.countInput i1
45     Streams.skipToEof i2
47     len <- getCount
48     return len

For our example, however, we don’t need anything nearly so fancy; you can of course use the lambda function in-line we showed originally. If you also wanted to spit the response headers out to stdout, Response also has a useful Show instance.

38     receiveResponse c (\p i -> do
39         putStr $ show p
40         Streams.connect i stdout)

which is, incidentally, exactly what the predefined debugHandler function does:

38     receiveResponse c debugHandler

either way, when we run this code, it will print out:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Sun, 24 Feb 2013 11:57:10 GMT
Server: Snap/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Type: text/plain

Sun 24 Feb 13, 11:57:10.765Z

Obviously you don’t normally need to print the headers like that, but they can certainly be useful for testing.

Close connection

Finally we close the connection to the server:

41     closeConnection c

And that’s it!

More advanced modes of operation are supported. You can reuse the same connection, of course, and you can also pipeline requests [sending a series of requests followed by reading the corresponding responses in order]. And meanwhile the library goes to some trouble to make sure you don’t violate the invariants of HTTP; you can’t read more bytes than the response contains, but if you read less than the length of the response, the remainder of the response will be consumed for you.

Don’t forget to use the conveniences before you go

The above is simple, and if you need to refine anything about the request then you’re encouraged to use the underlying API directly. However, as often as not you just need to make a request of a URL and grab the response. Ok:

61 main :: IO ()
62 main = do
63     x <- get "" concatHandler
64     S.putStrLn x

The get function is just a wrapper around composing a GET request using the basic API, and concatHandler is a utility handler that takes the entire response body and returns it as a single ByteString — which somewhat defeats the purpose of “streaming” I/O, but often that’s all you want.

There are put and post convenience functions as well. They take a function for specifying the request body and a handler function for the response. For example:

66     put "" (fileBody "fozzie.jpg") handler

this time using fileBody, another of the pre-defined entity body functions.

Finally, for the ever-present not-going-to-die-anytime-soon application/x-www-form-urlencoded POST request — everyone’s favourite — we have postForm:

67     postForm "" [("name","Kermit"),("role","Stagehand")] handler

Secure connections

I’ve also completely neglected to mention until now SSL support and error handling. Secure connections are supported using openssl; if you’re working in the convenience API you can just request an https:// URL as shown above; in the underlying API you call openConnectionSSL instead of openConnection. As for error handling, a major feature of io-streams is that you leverage the existing Control.Exception mechanisms from base; the short version is you can just wrap bracket around the whole thing for any exception handling you might need — that’s what the convenience functions do, and there’s a withConnection function which automates this for you if you want.


I’m pretty happy with the http-streams API at this point and it’s pretty much feature complete. A fair bit of profiling has been done, and the code is pretty sound at this point. Benchmarks against other HTTP clients are favourable.

After a few years working in Haskell this is my first go at implementing a library as opposed to just writing applications. There’s a lot I’ve had learn about writing good library code, and I’ve really appreciated working with Gregory Collins as we’ve fleshed out this API together. Thanks also to Erik de Castro Lopo, Joey Hess, Johan Tibell, and Herbert Valerio Riedel for their review and comments.

You can find the API documentation for Network.Http.Client here (until Hackage generates the docs) and the source code at GitHub.



  1. Code snippets updated to reflect API change made to buildRequest as of v0.4.0. You no longer need to pass the Connection object when building a Request.

Defining OS specific code in Haskell

I’ve got an ugly piece of Haskell code:

213 baselineContextSSL :: IO SSLContext
214 baselineContextSSL = do
215     ctx <- SSL.context    -- a completely blank context
216     contextSetDefaultCiphers ctx
217 #if defined __MACOSX__
218     contextSetVerificationMode ctx VerifyNone
219 #elif defined __WIN32__
220     contextSetVerificationMode ctx VerifyNone
221 #else
222     contextSetCADirectory ctx "/etc/ssl/certs"
223     contextSetVerificationMode ctx $
224         VerifyPeer True True Nothing
225 #endif
226     return ctx

this being necessary because the non-free operating systems don’t store their X.509 certificates in a place that openssl can reliably discover them. This sounds eminently solvable at lower levels, but that’s not really my immediate problem; after all, this sort of thing is what #ifdefs are for. The problem is needing to get an appropriate symbol based on what OS you’re using defined.

I naively assumed there would be __LINUX__ and __MACOSX__ and __WIN32__ macros already defined by GHC because, well, that’s just the sort of wishful thinking that powers the universe.

When I asked the haskell-cafe mailing list for suggestions, Krzysztof Skrzętnicki said that I could use in my project’s .cabal file. Nice, but problematic because you’re not always building using Cabal; you might be working in ghci, you might be using a proper Makefile to build your code, etc. Then Henk-Jan van Tuyl pointed out that you can get at the Cabal logic care of Distribution.System. Hey, that’s cool! But that would imply depending on and linking the Cabal library into your production binary. That’s bad enough, but the even bigger objection is that binaries aren’t portable, so what’s the point of having a binary that — at runtime! — asks what operating system it’s on? No; I’d rather find that out at build time and then let the C pre-processor include only the relevant code.

This feels simple and an appropriate use of CPP; even the symbol names look just about like what I would have expected (stackoverflow said so, must be true). Just need to get the right symbol defined at build time. But how?

Build Types

Running cabal install one sees all kinds of packages building and I’d definitely noticed some interesting things happen; some packages fire off what is obviously an autoconf generated ./configure script; others seem to use ghci or runghc to dynamically interpret a small Haskell program. So it’s obviously do-able, but as is often the case with Haskell it’s not immediately apparent where to get started.

Lots of libraries available on Hackage come with a top-level Setup.hs. Whenever I’d looked in one all I’d seen is:

  1 import Distribution.Simple
  2 main = defaultMain

which rather rapidly gave me the impression that this was a legacy of older modules, since running:

$ cabal configure
$ cabal build
$ cabal install

on a project without a Setup.hs apparently just Does The Right Thing™.

It turns out there’s a reason for this. In a project’s .cabal file, there’s a field build-type that everyone seems to define, and of course we’re told to just set this to “Simple”:

 27 build-type:          Simple

what else would it be? Well, the answer to that is that “Simple” is not the default; “Custom” is (really? weird). And a custom build is one where Cabal will compile and invoke Setup.hs when cabal configure is called.


When you look in the documentation of the Cabal library (note, this is different from the cabal-install package which makes the cabal executable we end up running) Distribution.Simple indeed has defaultMain but it has friends. The interesting one is defaultMainWithHooks which takes this monster as its argument; sure enough, there are pre-conf, post-conf, pre-build, post-build, and so on; each one is a function which you can easily override.

 20 main :: IO ()
 21 main = defaultMainWithHooks $ simpleUserHooks {
 22        postConf = configure
 23     }
 25 configure :: Args -> ConfigFlags -> PackageDescription -> LocalBuildInfo -> IO ()
 26 configure _ _ _ _ = do
 27     ...

yeay for functions as first class objects. From there it was a simple matter to write some code in my configure function to call Distribution.Simple’s buildOS and write out a config.h file with the necessary #define I wanted:

  1 #define __LINUX__

Include Paths

We’re not quite done yet. As soon as you want to #include something, you have to start caring about include paths. It would appear the compiler, by default, looks in the same directory as the file it is compiling. Fair enough, but I don’t really want to put config.h somewhere deep in the src/Network/Http/ tree; I want to put it in the project’s top level directory, commonly known as ., also known as “where I’m editing and running everything from”. So you have to add a -I"." option to ghc invocations in your Makefiles, your .cabal file needs to be told in its way:

 61 library
 62   include-dirs:      .

and as for ghci, it turns out you can put a .ghci in your sources:

  1 :set -XOverloadedStrings
  2 :set +m
  3 :set -isrc:tests
  4 :set -I.

and if you put that in your project root directory, running ghci there will work without having to specify all that tedious nonsense on the command line.

The final catch is that you have to be very specific about where you put the #include directive in your source file. Put it at the top? Won’t work. After the pragmas? You’d think. Following the module statement? Nope. It would appear that it strictly has to go after the imports and before any real code. Line 65:

 47 import Data.Monoid (Monoid (..), (<>))
 48 import qualified Data.Text as T
 49 import qualified Data.Text.Encoding as T
 50 import Data.Typeable (Typeable)
 51 import GHC.Exts
 52 import GHC.Word (Word8 (..))
 53 import Network.URI (URI (..), URIAuth (..), parseURI)
 65 #include "config.h"
 67 type URL = ByteString
 69 --
 70 -- | Given a URL, work out whether it is normal or secure, and then
 71 -- open the connection to the webserver including setting the
 72 -- appropriate default port if one was not specified in the URL. This
 73 -- is what powers the convenience API, but you may find it useful in
 74 -- composing your own similar functions.
 75 --
 76 establishConnection :: URL -> IO (Connection)
 77 establishConnection r' = do
 78     ...

You get the idea.


Several people wrote to discourage this practice, arguing that conditional code is the wrong approach to portability. I disagree, but you may well have a simple piece of code being run dynamically that would do well enough just making the choice at runtime; I’d be more comfortable with that if the OS algebraic data type was in base somewhere; linking Cabal in seems rather heavy. Others tried to say that needing to do this at all is openssl’s fault and that I should be using something else. Perhaps, and I don’t doubt that we’ll give tls a try at some point. But for now, openssl is battle-tested crypto and the hsopenssl package is a nice language binding and heavily used in production.

Meanwhile I think I’ve come up with a nice technique for defining things to drive conditional compilation. You can see the complete Setup.hs I wrote here; it figures out which platform you’re on and writes the .h file accordingly. If you have need to do simple portability conditionals, you might give it a try.


Using inotify to trigger builds

Having switched from an Eclipse (which nicely takes care of building your project for you) to working in gVim (which does nothing of the sort), it’s a bit tedious to have to keep switching from the editor’s window to a terminal in the right directory to whack Up then Enter to run make again.

I know about inotify, a capability in the Linux kernel to watch files for changes, but I hadn’t realized there was a way to use it from the command line. Turns out there is! Erik de Castro Lopo pointed me to a program called inotifywatch that he was using in a little shell script to build his Haskell code. Erik had it set up to run make if one of the files he’d listed on the script’s command line changed.

Saving isn’t what you think it is

I wanted to see if I could expand the scope in a few ways. For one thing, if you had inotifywatch running on a defined list of files and you created a new source file, it wouldn’t trigger a build because it wasn’t being watched by inotify. So I had a poke at Erik’s script.

Testing showed that the script was working, but not quite for the reason that we thought. It was watching for the ‘modify’ event, but actually catching a non-zero exit code. That’s strange; I was expecting a normal 0 not error 1. Turns out 1 is the exit code in situations when the file you were watching is deleted. Huh? All that I did was save a file!

Of course, that’s not what many programs do when saving. To avoid the risk of destroying your original file in the event of having an I/O error when overwriting it, most editors do NOT modify your file in place; they write a new copy and then atomically rename it over the original. Other programs move your original to a backup name and write a new file. Either way, you’ve usually got a new inode. Most of the time.

And that makes the exact event(s) to listen for tricky to pin down. At first glance the ‘modify’ one seemed a reasonable choice, but as we’ve just seen that turns out to not be much use, and meanwhile you end up triggering due to changes made to transient garbage like Vim’s swap files — certainly what you don’t want to trigger a build. Given that a new file is being made, I then tried watching for the ‘create’ event, but it’s overblown with noise too. Finally, you want touching a file to result in a rebuild and that doesn’t involve a ‘create’ event.

It turns out that saving (however done) and touching a file have in common that at some point in the sequence your file (or its backup) will be opened and then closed for writing. inotify has a ‘close_write’ event (complimenting ‘close_nowrite’) so that’s the one to watch for.

If you want to experiment with figuring all this yourself, try doing:

$ inotifywait -m .

and then use your editor and build tools as usual. It’s pretty interesting. The inotifywait(1) program is part of the 'inotify-tools' package on Debian-based Linux systems.

Resurrection, Tron style

Triggering a build automatically is brilliant, but only half the equation; inevitably you want to run the program after building it. It gets harder; if the thing you’re hacking on is a service then, having run it, you’ve got to kill it off and restart it to find out if your code change fixed the problem. How many times have you been frustrated that your bugfix hasn’t taken only to realize you’ve forgotten to restart the thing you’re testing? Running the server manually in yet another terminal window and then killing it and restarting it — over and over — is quite a pain. So why not have that triggered as a result of the inotify driven build as well?

Managing concurrent tasks is harder than it should be. Bash has “job control“, of course, and we’re well used to using it in interactive terminals:

$ ./program

$ bg
[1] 13796
$ jobs
[1] Running    ./program

$ kill %1
[1] Terminated ./program

It’s one thing to run something and then abort it, but it’s another thing entirely to have a script that runs it and then kills it off in reaction to a subsequent event. Job control is lovely when you’re interactive but for various reasons is problematic to use in a shell script (though, if you really want to, see set -m). You can keep it simple, however: assuming for a moment you have just one program that needs running to test whatever-it-is-you’re-working-on, you can simply capture the process id and use that:


    ./program &

Then you can later, in response to whatever stimuli, do:

    kill $PID

Said stimulus is, of course, our blocking call to inotifywait, returning because a file has been saved.


Do a build. If it succeeds, run the specified program in the background then block waiting for an inotify ‘close_write’ event. When that happens, kill the program and loop back to the beginning. Easy, right? Sure. that’s why it took me all day.

I called it inotifymake. Usage is simple; throw it in ~/bin then:

$ cd ~/src/project/branch/
$ inotifymake -- ./program

make does its thing, building program as its result
program runs

change detected!

kill program make does its thing, build failed :(

change detected!

make does its thing, rebuilding program as a result
program runs

the nice part being that the server or whatever isn’t running in the middle when the build is borked; http://localhost:8000/ isn’t answering. Yeah.

[The “--” back there separates make targets from the name of the executable you’re running, so you can do inotifymake test or inotifymake test -- ./program if you like]

So yes, that was a lot of effort for not a lot of script, but this is something I’ve wanted for a long time, and seems pretty good so far. I’m sure it could be improved; frankly if it needed to be any more rigorous I’d rewrite it as a proper C program, but in the mean time, this will do. Feedback welcome if you’ve got any ideas; branch is there if you want it.


Learning Haskell

In the land of computer programming, newer has almost always meant better. Java was newer than C, and better, right? Python was better than Perl. Duh, Ruby is better than everything, so they’d tell you. But wait, Twitter is written in Scala. Guess that must be the new hotness, eh?

Haskell has been around for quite a while; somehow I had it in my head that it was outdated and only for computer science work. After all, there are always crazy weirdos out there in academia working on obscure research languages — at least, that’s the perspective from industry. After all, we’re the ones getting real work done. All you’re doing is sequencing the human genome. We invented Java 2 Enterprise Edition. Take that, ivory tower.

The newness bias is strong, which is why I was poleaxed to find people I respect like Erik de Castro Lopo and Conrad Parker working hard in, of all things, Haskell. And now they’re encouraging me to program in it, too (surely, cats and dogs are sleeping together this night). On their recommendation I’ve been learning a bit, and much to my surprise, it turns out Haskell is vibrant, improving, and really cutting edge.

The next thing

I get the impression that people are tired of being told that the some cool new thing makes everything else they’ve been doing irrelevant. Yet many professional programmers (and worse, fanboy script kiddies) are always looking to the next big thing, the next cool language. Often the very people you respect about a topic have already moved on to something else (there’s a book deal in it for them if they can write it fast enough).

But still; technology is constantly changing and there’s always pressure to be doing the latest and greatest. I try my best to resist this sort of thing, just in the interest of actually getting anything done. Not always easy, and the opposite trap is to adopt a bunker mentality whereby you defend what you’re doing against all comers. Not much learning going on there either.

There is, however, a difference between the latest new thing and learning something new.

One of the best things about being active in open source is the opportunity to meet people who you can look up to and learn from. I may know a thing or two about operations and crisis and such, but my techie friends and colleagues are my mentors when it comes to software development and computer engineering. One thing they have taught me over the years is the value of setting out deliberately to “stretch” your mind. Specifically, experimenting with a new programming language that is not your day-to-day working environment, but something that will force your to learn new ways of looking at problems. These guys are professionals; they recognize that whatever your working language(s) are, you’re going to keep using them because you get things done there. It’s not about being seduced by the latest cool project that some popular blogger would have you believe is the be-all-and-end-all. Rather, in stretching, you might be able to bring ideas back to your main work and just might improve thereby. I think there is wisdom there.

Should I attempt to learn Haskell?

I’ve had an eye on functional programming for a while now; who hasn’t? Not being from a formal computer science or mathematics background — (“damnit Jim, I’m an engineer, not an english major” when called upon to defend my atrocious spelling) — the whole “omigod, like, everything is function and that’s like, totally cool” mantra isn’t quite as compelling by itself as it might be. But lots of people I respect have been going on about functional programming for a while now, and it seemed a good direction to stretch. So I asked which language should I learn?

My colleagues suggested Haskell for a number of reasons. That cutting edge research was happening there and that increasingly powerful things were being implemented in the compiler and runtime as a result sounded interesting. That Haskell being a pure functional language (I didn’t know yet what that meant but that’s beside the point) would really force me to learn a functional way of doing at things (as opposed to some others where you can do functional things but can easily escape those constraints; pragmatic, perhaps, but since the idea was to learn something new, that made Haskell sound good rather than perceiving this as a limitation). Finally, they claimed that you could express problems concisely (brevity good, though not if it’s so dense that it’s write-only).

Considering a new language (or, within a language, considering various competing frameworks for web applications, graphical user interface, production deployment, etc) my sense is that when we look at such things we are all fairly quick to judge, based on our own private aesthetic. Does it look clean? Can I do things I need to do with this easily? How do the authors conceive of the problem space? (in web programming especially, a given framework will make some things easy and other things nigh on impossible; you need to know what world-view you’re buying into).

I don’t know about you, but elegance by itself and in the abstract is not sufficient. Elegance is probably the most highly valued characteristic of good engineering design, but it must be coupled with practicality. In other words, does the design get the job done? So before I was willing to invest time learning Haskell, I wanted to at least have some idea that I’d be able to use it for something more than just academic curiosity.

Incidentally, I’m not sure the Haskell community does itself many favours by glorifying in how clever you can be in the language; the implied corollary is that you can’t do anything without being exceedingly clever about it. If true, that would be tedious. I get the humour of the commentary that as we gain experience we tend to overcomplicate things, as seen in the many different ways there are to express a factorial function. But I saw that article linked from almost every thread about how clever you can be with Haskell; is that the sort of thing that you want to use as an introduction for newcomers? Given the syntax is so different from what people are used to in mainstream C-derived programming languages, the code there just looks like mush. The fact that there are people who studied mathematics are doing theorem proving in the language is fascinating, but the tone is very elevated as a result. A high bar for a newcomer — even a professional with 25 years programming experience — to face.

It became clear pretty fast that I wouldn’t have the faintest idea what I was looking at, but I still tried to see if I could get a sense of what using Haskell would be like. Search on phrases like “haskell performance”, “haskell in production”, “commercial use of haskell”, “haskell vs scala”, and so on. You get more than enough highly partisan discussion. It’s quick to see people love the language. It’s a little harder to evidence see it being used in anger, but eventually I came across pages like Haskell in Industry and Haskell Weekly News which have lots of interesting links. That pretty much convinced me it’d be worth giving it a go.

A brief introduction

So here I am, struggling away learning Haskell. I guess I’d have to say I’m still a bit dubious, but the wonderful beginner tutorial called Learn You A Haskell For Great Good (No Starch Press) has cute illustrations. :) The other major starting point is Real World Haskell (O’Reilly). You can flip through it online as well, but really, once you get the idea, I think you’ll agree it’s worth having both in hard copy.

Somewhere along the way my investigations landed me on discussion of something called “software transactional memory” as an approach to concurrency. Having been a Java person for quite some years, I’m quite comfortable with multi-threading [and exceptionally tired of the rants from people who insist that you should only write single threaded programs], but I’m also aware that concurrency can be hard to get right and that solving bugs can be nasty. The idea of applying the database notion of transactions to memory access is fascinating. Reading about STM led me to this (short, language agnostic) keynote given at OSCON 2007 by one Simon Peyton-Jones, an engaging speaker and one of the original authors of GHC. Watching the video, I heard him mention that he had done an “introduction to Haskell” earlier in the conference. Huh. Sure enough, linked from here, are his slides and the video they took.

Watching the tutorial implies a non-trivial investment in time, and a bit of care to manually track the slides with him as he is presenting, but viewing it all the way through was a very rewarding experience. By the time I watched this I’d already read Learn You A Haskell and a goodly chunk of Real World Haskell, but if anything that made it even more fascinating; I suppose I was able to concentrate more on what he was saying for the emphasis on why things in Haskell are the way they were.

I was quite looking forward to how he would introduce I/O to an audience of beginners; like every other neophyte I’m grinding through learning what “monads” are and how they enable pure functional programming to coexist with side effects. Peyton-Jones’s discussion of IO turns up towards the end (part 2 at :54:36), when this definition went up on a slide:

IO (a) :: World -> (a, World)

accompanied by this description:

“You can think of it as a function that takes a World to a pair of a and a new World … a rather egocentric functional programmer’s view of things in which your function is center of the universe, and the entire world sort of goes in one side of your function, gets modified a bit by your function, and emerges, in a purely functional way, in a freshly minted world which comes out the other…”

“Oh, so that’s a metaphor?” asked one of his audience.

“Yes. The world does not actually disappear into your laptop. But you can think of it that way if you like.”

Ha. :)

Isolation and reusability

A moment ago I mentioned practicality. The most practical thing going these days is the web problem, i.e. using a language and its toolchain to do web programming. Ok, so what web frameworks are there for Haskell? Turns out there are a few. Two newer ones in particular, Yesod and the Snap Framework. Their raw performance as web servers looks very impressive, but the real question is how does writing web pages & application logic go down? Yesod’s approach, called “Hamlet“, doesn’t do much for me. I can see why type safety across the various pieces making up a web page would be something you’d aspire to, but it ain’t happening (expecting designers to embed their work in a pseudo-but-not-actually HTML language has been tried before. Frequently. And it’s been a bust every time). Snap, on the other hand, has something called “Heist“. Templates are pure HTML and when you need to drop in programmatically generated snippets you do so with a custom tag that gets substituted in at runtime. That’s alright. As for writing said markup from within code there’s a different project called “Blaze” which looks easy enough to use.

Reading a thread about Haskell web programming, I saw explicit acknowledgement on the part of framework authors from all sides that it would be possible to mix and match, at least in theory. If you like Yesod’s web server but would rather to use Snap’s Heist template engine, you could probably do so. You’d be in for all the glue code and knowing what you’re about, but this still raises an interesting point.

A big deal with Haskell — and one of the core premises of programming in a functional language that emphasizes purity and modularity — is that you can rely on code from other libraries not to interfere with your code. It’s more than just “no global variables”; pure functions are self contained, and when there are side effects (as captured in IO and other monads) they are explicitly marked and segregated from pure code. In IT we’ve talked about reusable code for a long time, and we’ve all struggled with it: the sad reality is that in most languages, when you call something you have few guarantees that nothing else is going to happen over and above what you’ve asked for. The notion of a language and its runtime explicitly going out of its way to inhibit this sort of thing is appealing.

Hello web, er world

Grandiose notions aside, I wanted to see if I could write something that felt “clean”, even if I’m not yet proficient in the language. I mentioned above that I liked the look of Snap. So, I roughed out some simple exercises of what using the basic API would be like. The fact that I am brand new at Haskell of course meant it took a lot longer than it should have! That’s ok, I learnt a few things along the way. I’ll probably blog separately about it, but after an essay about elegance and pragmatism, I thought I should close with some code. The program is just a little ditty that echos your HTTP request headers back to you, running there. You can decide for yourself if the source is aesthetically pleasing; ’tis a personal matter. I think it’s ok, though I’m not for a moment saying that it’s “good” style or anything. I will say that with Haskell I’ve already noticed that what looks deceptively simple often takes a lot of futzing to get the types right — but I’ve also noticed that when something does finally compile, it tends to be very close to being done. Huh.

So here I am freely admitting that I was quite wrong about Haskell. It’s been a bit of a struggle getting started, and I’m still a bit sceptical about the syntax, but I think the idea of leveraging Haskell shows promise, especially for server-side work.