An HTTP client in Haskell using io-streams

An HTTP client

I’m pleased to announce http-streams, an HTTP client library for Haskell, using the Snap Framework’s new io-streams library to handle the streaming I/O.

Back and there again

I’ve been doing a lot of work lately using Haskell to do reprocessing of data from various back-end web services and then presenting fragments of that information in the specific form needed to drive client-side visualizations. Nothing unusual about that; on one edge of your program you have a web server and on the other you’re making make onward calls to further servers. Another project includes meshes of agents talking to other agents; again, nothing extreme; just a server daemon responding to requests and in turn making its own requests of others. Fairly common in any environment build on (and in turn offering) RESTful APIs.

I’m doing my HTTP server work with the fantastic Snap Framework; it’s a lightweight and decently performing web server library with a nice API. To go with that I needed an web client library, but there the choices are less inspiring.

Those working in Yesod have the powerful http-conduit package, but I didn’t find it all that easy to work with. I soon found myself writing a wrapper around it just so I could use types and an API that made more sense to me.

Because I was happily writing web apps using Snap, I thought it would be cool to write a client library that would use the same types. After much discussion with Gregory Collins and others, it became clear that trying to reuse the Request and Response types from snap-core wasn’t going to be possible. But there was a significant amount of code in Snap’s test suite, notably almost an entire HTTP client implementation. Having used Snap.Test to build test code for some of my own web service APIs, I knew there was some useful material there, and that gave me a useful starting point.

Streaming I/O

One of the exciting things about Haskell is the collaborative way that boundaries are pushed. From the beginnings in iteratee and enumerator, the development of streaming I/O libraries such as conduit and pipes has been phenomenal.

The Snap web server made heavy use of the original iteratee/enumerator paradigm; when I talked to some of the contributors in #snapframework about whether they were planning to upgrade to one of the newer streaming I/O libraries, I discovered from Greg that he and Gabriel were quietly working on a re-write of the internals of the server, based on their experiences doing heavy I/O in production.

This new library is io-streams, aimed at being a pragmatic implementation of some of the impressive theoretical work from the other streaming libraries. io-streams‘s design makes the assumption that you’re working in … IO, which seems to have allowed them to make some significant optimizations. The API is really clean, and my early benchmarks were promising indeed.

That was when I realized that being compatible with Snap was less about the Request and Response types and far more about being able to smoothly pass through request and response bodies — in other words, tightly integrating with the streaming I/O library used to power the web server.

http-streams, then, is an HTTP client library built to leverage and in turn expose an API based on the capabilities of io-streams.

A simple example

We’ll make a GET request of http://kernel.operationaldynamics.com:58080/time (which is just a tinsy web app which returns the current UTC time). The basic http-streams API is pretty straight forward:

10 
11 {-# LANGUAGE OverloadedStrings #-}
23 
24 import System.IO.Streams (InputStream, OutputStream, stdout)
25 import qualified System.IO.Streams as Streams
26 import Network.Http.Client
27 
28 main :: IO ()
29 main = do
30     c <- openConnection "kernel.operationaldynamics.com" 58080
31 
32     q <- buildRequest $ do
33         http GET "/time"
34         setAccept "text/plain"
35 
36     sendRequest c q emptyBody
37 
38     receiveResponse c (\p i -> do
39         Streams.connect i stdout)
40 
41     closeConnection c
42 

which results in

Sun 24 Feb 13, 11:57:10.765Z

Open connection

Going through that in a bit more detail, given that single import and some code running in IO, we start by opening a connection to the appropriate host and port:

30     c <- openConnection "kernel.operationaldynamics.com" 58080

Create Request object

Then you can build up the request you need:

32     q <- buildRequest $ do
33         http GET "/time"
34         setAccept "text/plain"

that happens in a nice little state monad called RequestBuilder with a number of simple functions to set various headers.

Having built the Request object we can have a look at what the outbound request would look like over the wire, if you’re interested. Doing:

35     putStr $ show q

would have printed out:

GET /time HTTP/1.1
Host: kernel.operationaldynamics.com:58080
User-Agent: http-streams/0.3.0
Accept-Encoding: gzip
Accept: text/plain

Send request

Making the request is a simple call to sendRequest. It takes the Connection, a Request object, and function of type

   (OutputStream Builder -> IO α)

which is where we start seeing the System.IO.Streams types from io-streams. If you’re doing a PUT or POST you write a function where you are handed the OutputStream and can write whatever content you want to it. Here, however, we’re just doing a normal GET request which by definition has no request body so we can use emptyBody, a predefined function of that type which simply returns without sending any body content. So:

36     sendRequest c q emptyBody

gets us what we want. If we were doing a PUT or POST with a request body, we’d write to the OutputStream in our body function. It’s an OutputStream of Builders as a fairly significant optimization; the library will end up chunking and sending over an underlying OutputStream ByteString which is wrapped around the socket, but building up the ByteString(s) first in a Builder reduces allocation overhead when smacking together all the small strings that the request headers are composed of; taken together it often means requests will be done in a single sendto(2) system call.

Read response

To read the reply from the server you make a call to receiveResponse. Like sendRequest, you pass the Connection and a function to handle the entity body, this time one which will read the response bytes. It’s type is

   (Response -> InputStream ByteString -> IO β)

This is where things get interesting. We can use the Response object to find out the status code of the response, read various headers, and deal with the reply accordingly. Perhaps all we care about is the status code:

42 statusHandler :: Response -> InputStream ByteString -> IO ()
43 statusHandler p i = do
44     case getStatusCode p of
45         200 -> return ()
46         _   -> error "Bad server!"

The response body it available through the InputStream, which is where we take advantage of the streaming I/O coming down from the server. For instance, if you didn’t trust the server’s Content-Length header and wanted to count the length of the response yourself:

42 countHandler :: Response -> InputStream ByteString -> IO Int
43 countHandler p i1 = do
44     go 0 i1
45   where
46     go !acc i = do
47         xm <- Streams.read i
48         case xm of
49             Just x  -> go (acc + S.length x) i
50             Nothing -> return acc

Ok, that’s pretty contrived, but it shows the basic idea: when you read from an InputStream a it’s a sequence of Maybe a; when you get Nothing the input is finished. Realistic usage of io-streams is a bit more idiomatic; the library offers a large range of functions for manipulating streams, many of which are wrappers to build up more refined streams from lower-level raw ones. In this case, we could do the counting trick using countInput which gives you an action to tell you how many bytes it saw:

42 countHandler2 p i1 = do
43     (i2, getCount) <- Streams.countInput i1
44 
45     Streams.skipToEof i2
46 
47     len <- getCount
48     return len

For our example, however, we don’t need anything nearly so fancy; you can of course use the lambda function in-line we showed originally. If you also wanted to spit the response headers out to stdout, Response also has a useful Show instance.

38     receiveResponse c (\p i -> do
39         putStr $ show p
40         Streams.connect i stdout)

which is, incidentally, exactly what the predefined debugHandler function does:

38     receiveResponse c debugHandler

either way, when we run this code, it will print out:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Sun, 24 Feb 2013 11:57:10 GMT
Server: Snap/0.9.2.4
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Type: text/plain

Sun 24 Feb 13, 11:57:10.765Z

Obviously you don’t normally need to print the headers like that, but they can certainly be useful for testing.

Close connection

Finally we close the connection to the server:

41     closeConnection c

And that’s it!

More advanced modes of operation are supported. You can reuse the same connection, of course, and you can also pipeline requests [sending a series of requests followed by reading the corresponding responses in order]. And meanwhile the library goes to some trouble to make sure you don’t violate the invariants of HTTP; you can’t read more bytes than the response contains, but if you read less than the length of the response, the remainder of the response will be consumed for you.

Don’t forget to use the conveniences before you go

The above is simple, and if you need to refine anything about the request then you’re encouraged to use the underlying API directly. However, as often as not you just need to make a request of a URL and grab the response. Ok:

61 main :: IO ()
62 main = do
63     x <- get "http://www.haskell.org/" concatHandler
64     S.putStrLn x

The get function is just a wrapper around composing a GET request using the basic API, and concatHandler is a utility handler that takes the entire response body and returns it as a single ByteString — which somewhat defeats the purpose of “streaming” I/O, but often that’s all you want.

There are put and post convenience functions as well. They take a function for specifying the request body and a handler function for the response. For example:

66     put "http://s3.example.com/" (fileBody "fozzie.jpg") handler

this time using fileBody, another of the pre-defined entity body functions.

Finally, for the ever-present not-going-to-die-anytime-soon application/x-www-form-urlencoded POST request — everyone’s favourite — we have postForm:

67     postForm "https://jobs.example.com/" [("name","Kermit"),("role","Stagehand")] handler

Secure connections

I’ve also completely neglected to mention until now SSL support and error handling. Secure connections are supported using openssl; if you’re working in the convenience API you can just request an https:// URL as shown above; in the underlying API you call openConnectionSSL instead of openConnection. As for error handling, a major feature of io-streams is that you leverage the existing Control.Exception mechanisms from base; the short version is you can just wrap bracket around the whole thing for any exception handling you might need — that’s what the convenience functions do, and there’s a withConnection function which automates this for you if you want.

Status

I’m pretty happy with the http-streams API at this point and it’s pretty much feature complete. A fair bit of profiling has been done, and the code is pretty sound at this point. Benchmarks against other HTTP clients are favourable.

After a few years working in Haskell this is my first go at implementing a library as opposed to just writing applications. There’s a lot I’ve had learn about writing good library code, and I’ve really appreciated working with Gregory Collins as we’ve fleshed out this API together. Thanks also to Erik de Castro Lopo, Joey Hess, Johan Tibell, and Herbert Valerio Riedel for their review and comments.

You can find the API documentation for Network.Http.Client here (until Hackage generates the docs) and the source code at GitHub.

AfC

Updates

  1. Code snippets updated to reflect API change made to buildRequest as of v0.4.0. You no longer need to pass the Connection object when building a Request.