hackergotchi
Operations and other mysteries

Andrew Cowie is a long time Linux engineer and Open Source advocate, repentant Java developer, Haskell aficionado, and GNOME hacker!

Professionally Andrew has consulted in IT operations, business leadership, and tries to help people remove the bottlenecks in their processes so they can run their technology more effectively

He is currently Head of Engineering at Anchor Systems, working to develop the next generation of utility computing infrastructure and tooling.

Contact...

Twitter @afcowie

Google Plus +Andrew Cowie

Email 0x5CB48AEA

RSS Feed /andrew

Learning Haskell

In the land of computer programming, newer has almost always meant better. Java was newer than C, and better, right? Python was better than Perl. Duh, Ruby is better than everything, so they’d tell you. But wait, Twitter is written in Scala. Guess that must be the new hotness, eh?

Haskell has been around for quite a while; somehow I had it in my head that it was outdated and only for computer science work. After all, there are always crazy weirdos out there in academia working on obscure research languages — at least, that’s the perspective from industry. After all, we’re the ones getting real work done. All you’re doing is sequencing the human genome. We invented Java 2 Enterprise Edition. Take that, ivory tower.

The newness bias is strong, which is why I was poleaxed to find people I respect like Erik de Castro Lopo and Conrad Parker working hard in, of all things, Haskell. And now they’re encouraging me to program in it, too (surely, cats and dogs are sleeping together this night). On their recommendation I’ve been learning a bit, and much to my surprise, it turns out Haskell is vibrant, improving, and really cutting edge.

The next thing

I get the impression that people are tired of being told that the some cool new thing makes everything else they’ve been doing irrelevant. Yet many professional programmers (and worse, fanboy script kiddies) are always looking to the next big thing, the next cool language. Often the very people you respect about a topic have already moved on to something else (there’s a book deal in it for them if they can write it fast enough).

But still; technology is constantly changing and there’s always pressure to be doing the latest and greatest. I try my best to resist this sort of thing, just in the interest of actually getting anything done. Not always easy, and the opposite trap is to adopt a bunker mentality whereby you defend what you’re doing against all comers. Not much learning going on there either.

There is, however, a difference between the latest new thing and learning something new.

One of the best things about being active in open source is the opportunity to meet people who you can look up to and learn from. I may know a thing or two about operations and crisis and such, but my techie friends and colleagues are my mentors when it comes to software development and computer engineering. One thing they have taught me over the years is the value of setting out deliberately to “stretch” your mind. Specifically, experimenting with a new programming language that is not your day-to-day working environment, but something that will force your to learn new ways of looking at problems. These guys are professionals; they recognize that whatever your working language(s) are, you’re going to keep using them because you get things done there. It’s not about being seduced by the latest cool project that some popular blogger would have you believe is the be-all-and-end-all. Rather, in stretching, you might be able to bring ideas back to your main work and just might improve thereby. I think there is wisdom there.

Should I attempt to learn Haskell?

I’ve had an eye on functional programming for a while now; who hasn’t? Not being from a formal computer science or mathematics background — (“damnit Jim, I’m an engineer, not an english major” when called upon to defend my atrocious spelling) — the whole “omigod, like, everything is function and that’s like, totally cool” mantra isn’t quite as compelling by itself as it might be. But lots of people I respect have been going on about functional programming for a while now, and it seemed a good direction to stretch. So I asked which language should I learn?

My colleagues suggested Haskell for a number of reasons. That cutting edge research was happening there and that increasingly powerful things were being implemented in the compiler and runtime as a result sounded interesting. That Haskell being a pure functional language (I didn’t know yet what that meant but that’s beside the point) would really force me to learn a functional way of doing at things (as opposed to some others where you can do functional things but can easily escape those constraints; pragmatic, perhaps, but since the idea was to learn something new, that made Haskell sound good rather than perceiving this as a limitation). Finally, they claimed that you could express problems concisely (brevity good, though not if it’s so dense that it’s write-only).

Considering a new language (or, within a language, considering various competing frameworks for web applications, graphical user interface, production deployment, etc) my sense is that when we look at such things we are all fairly quick to judge, based on our own private aesthetic. Does it look clean? Can I do things I need to do with this easily? How do the authors conceive of the problem space? (in web programming especially, a given framework will make some things easy and other things nigh on impossible; you need to know what world-view you’re buying into).

I don’t know about you, but elegance by itself and in the abstract is not sufficient. Elegance is probably the most highly valued characteristic of good engineering design, but it must be coupled with practicality. In other words, does the design get the job done? So before I was willing to invest time learning Haskell, I wanted to at least have some idea that I’d be able to use it for something more than just academic curiosity.

Incidentally, I’m not sure the Haskell community does itself many favours by glorifying in how clever you can be in the language; the implied corollary is that you can’t do anything without being exceedingly clever about it. If true, that would be tedious. I get the humour of the commentary that as we gain experience we tend to overcomplicate things, as seen in the many different ways there are to express a factorial function. But I saw that article linked from almost every thread about how clever you can be with Haskell; is that the sort of thing that you want to use as an introduction for newcomers? Given the syntax is so different from what people are used to in mainstream C-derived programming languages, the code there just looks like mush. The fact that there are people who studied mathematics are doing theorem proving in the language is fascinating, but the tone is very elevated as a result. A high bar for a newcomer — even a professional with 25 years programming experience — to face.

It became clear pretty fast that I wouldn’t have the faintest idea what I was looking at, but I still tried to see if I could get a sense of what using Haskell would be like. Search on phrases like “haskell performance”, “haskell in production”, “commercial use of haskell”, “haskell vs scala”, and so on. You get more than enough highly partisan discussion. It’s quick to see people love the language. It’s a little harder to evidence see it being used in anger, but eventually I came across pages like Haskell in Industry and Haskell Weekly News which have lots of interesting links. That pretty much convinced me it’d be worth giving it a go.

A brief introduction

So here I am, struggling away learning Haskell. I guess I’d have to say I’m still a bit dubious, but the wonderful beginner tutorial called Learn You A Haskell For Great Good (No Starch Press) has cute illustrations. :) The other major starting point is Real World Haskell (O’Reilly). You can flip through it online as well, but really, once you get the idea, I think you’ll agree it’s worth having both in hard copy.

Somewhere along the way my investigations landed me on discussion of something called “software transactional memory” as an approach to concurrency. Having been a Java person for quite some years, I’m quite comfortable with multi-threading [and exceptionally tired of the rants from people who insist that you should only write single threaded programs], but I’m also aware that concurrency can be hard to get right and that solving bugs can be nasty. The idea of applying the database notion of transactions to memory access is fascinating. Reading about STM led me to this (short, language agnostic) keynote given at OSCON 2007 by one Simon Peyton-Jones, an engaging speaker and one of the original authors of GHC. Watching the video, I heard him mention that he had done an “introduction to Haskell” earlier in the conference. Huh. Sure enough, linked from here, are his slides and the video they took.

Watching the tutorial implies a non-trivial investment in time, and a bit of care to manually track the slides with him as he is presenting, but viewing it all the way through was a very rewarding experience. By the time I watched this I’d already read Learn You A Haskell and a goodly chunk of Real World Haskell, but if anything that made it even more fascinating; I suppose I was able to concentrate more on what he was saying for the emphasis on why things in Haskell are the way they were.

I was quite looking forward to how he would introduce I/O to an audience of beginners; like every other neophyte I’m grinding through learning what “monads” are and how they enable pure functional programming to coexist with side effects. Peyton-Jones’s discussion of IO turns up towards the end (part 2 at :54:36), when this definition went up on a slide:

IO (a) :: World -> (a, World)

accompanied by this description:

“You can think of it as a function that takes a World to a pair of a and a new World … a rather egocentric functional programmer’s view of things in which your function is center of the universe, and the entire world sort of goes in one side of your function, gets modified a bit by your function, and emerges, in a purely functional way, in a freshly minted world which comes out the other…”

“Oh, so that’s a metaphor?” asked one of his audience.

“Yes. The world does not actually disappear into your laptop. But you can think of it that way if you like.”

Ha. :)

Isolation and reusability

A moment ago I mentioned practicality. The most practical thing going these days is the web problem, i.e. using a language and its toolchain to do web programming. Ok, so what web frameworks are there for Haskell? Turns out there are a few. Two newer ones in particular, Yesod and the Snap Framework. Their raw performance as web servers looks very impressive, but the real question is how does writing web pages & application logic go down? Yesod’s approach, called “Hamlet“, doesn’t do much for me. I can see why type safety across the various pieces making up a web page would be something you’d aspire to, but it ain’t happening (expecting designers to embed their work in a pseudo-but-not-actually HTML language has been tried before. Frequently. And it’s been a bust every time). Snap, on the other hand, has something called “Heist“. Templates are pure HTML and when you need to drop in programmatically generated snippets you do so with a custom tag that gets substituted in at runtime. That’s alright. As for writing said markup from within code there’s a different project called “Blaze” which looks easy enough to use.

Reading a thread about Haskell web programming, I saw explicit acknowledgement on the part of framework authors from all sides that it would be possible to mix and match, at least in theory. If you like Yesod’s web server but would rather to use Snap’s Heist template engine, you could probably do so. You’d be in for all the glue code and knowing what you’re about, but this still raises an interesting point.

A big deal with Haskell — and one of the core premises of programming in a functional language that emphasizes purity and modularity — is that you can rely on code from other libraries not to interfere with your code. It’s more than just “no global variables”; pure functions are self contained, and when there are side effects (as captured in IO and other monads) they are explicitly marked and segregated from pure code. In IT we’ve talked about reusable code for a long time, and we’ve all struggled with it: the sad reality is that in most languages, when you call something you have few guarantees that nothing else is going to happen over and above what you’ve asked for. The notion of a language and its runtime explicitly going out of its way to inhibit this sort of thing is appealing.

Hello web, er world

Grandiose notions aside, I wanted to see if I could write something that felt “clean”, even if I’m not yet proficient in the language. I mentioned above that I liked the look of Snap. So, I roughed out some simple exercises of what using the basic API would be like. The fact that I am brand new at Haskell of course meant it took a lot longer than it should have! That’s ok, I learnt a few things along the way. I’ll probably blog separately about it, but after an essay about elegance and pragmatism, I thought I should close with some code. The program is just a little ditty that echos your HTTP request headers back to you, running there. You can decide for yourself if the source is aesthetically pleasing; ’tis a personal matter. I think it’s ok, though I’m not for a moment saying that it’s “good” style or anything. I will say that with Haskell I’ve already noticed that what looks deceptively simple often takes a lot of futzing to get the types right — but I’ve also noticed that when something does finally compile, it tends to be very close to being done. Huh.

So here I am freely admitting that I was quite wrong about Haskell. It’s been a bit of a struggle getting started, and I’m still a bit sceptical about the syntax, but I think the idea of leveraging Haskell shows promise, especially for server-side work.

AfC

A good GNOME 3 Experience

I’ve been using GNOME 3 full time for over 9 months, and I find it quite usable. I’ve had to learn some new usage patterns, but I don’t see that as a negative. It’s a new piece of software, so I’m doing my best to use it the way it’s designed to be used.

Sure, it’s different than GNOME 2. It’s vastly different. But it is a new UI paradigm. The GNOME 2 experience was over 9 years old, and largely based on the experience inherited from the old Windows 95 muxed with a bit of CDE. There were so many things that the GNOME hackers wanted to do — and lots of things all the UI studies said needed changing — that the old pattern simply couldn’t support.

Still, a lot of people are upset. Surprise. Most recently it’s been people running Debian Testing who just recently found that their distro has migrated its packages from GNOME 2.32 to GNOME 3.x. Distros like Ubuntu have been shipping GNOME 2.32 for ages; but it has been well over 2 years since anyone actually worked on that code. It’s wonderful that nothing has changed for you in all that time [a true Debian Stable experience!] but I think it’s a bit odd not to expect that something that was widely advertised as being such a different user experience is … different.

What I find annoying about these conversations is that if they had gone and bought an Apple laptop with Mac OS X on it they would be perfectly reasonably working through learning how to use a new Desktop and not complaining about it at all. But here we are admonishing the GNOME hackers had the temerity to do something new and different.

Installing

I went to some trouble to run GNOME 3 on Ubuntu Linux during the Natty cycle; that was a bit of work but I needed to be current; now with Oneiric things are mostly up to date. GNOME 3.0 was indeed a bit of a mess, but then so was GNOME 2.0. The recently released 3.2 is a big improvement. And it looks like the list of things that seem targeted to 3.4 will further improve things.

I’m now running GNOME 3 on a freshly built Ubuntu Oneiric system; I just did a “command line” install of Ubuntu and then installed gdm, gnome-shell, xserver-xorg and friends. Working great, and not having installed gnome-desktop saved me a huge amount of baggage. Of course a normal Oneiric desktop install and then similarly installing and switching to gnome-shell would work fine too; either way you probably want to enable the ppa:gnome3-team/gnome3 PPA.

Launchers

One thing I do recommend is mapping (say) CapsLock as an additional “Hyper” and then Caps + F1 .. Caps + F12 as launchers. I have epiphany browser on F1, evolution on F2, my IRC client on F3 and so on. Setting up Caps + A as to do gnome-terminal --window means you can pop a term easily from anywhere. You do the mapping in:

    System Settings → Keyboard Layout → Layout tab → Options...

and can set up launchers via:

    System Settings → Keyboard → Shortcuts tab → "Custom Shortcuts" → `[+]` button

(you’d think that’d all just be in one capplet, but anyway)

Not that my choices matter, per se, but to gives you an idea:

AcceleratorLaunchesDescription
Caps + F1 epiphany Web browser (primary)
Caps + F2 evolution Email mail
Caps + F3 pidgin IRC client
Caps + F4 empathy Jabber client
Caps + F5 firefox Web browser (alternate)
Caps + F6 shotwell Photo manager
Caps + F7 slashtime Timezone utility
Caps + F8 rhythmbox Music player
Caps + F9 eclipse Java IDE
Caps + F10 devhelp GTK documentation
Caps + F11 gucharmap Unicode character picker
Caps + F12 gedit Text editor
Caps + Z gnome-terminal --window New terminal window

That means I only use the Overview’s lookup mechanism (ie typing Win, T, R, A… in this case looking for the Project Hamster time tracker) for outlying applications. The rest of the time it’s Caps + F12 and bang, I’ve got GEdit in front of me.

Of course you can also set up the things you use the most on the “Dash” (I think that’s what they call it) as favourites. I’ve actually stopped doing that (I gather the original design didn’t have favourites at all); I prefer to have it as an alternative view of things that are actually running.

Extensions

People love plugin architectures, but they’re quite the anti-pattern; over and above the software maintenance headache (evolving upstream constantly breaks APIs used by plugins, for one example; the nightmare of packaging plugins safely being another) before long you get people installing things with contradictory behaviour and which completely trash the whole experience that your program was designed to have in the first place.

Case in point is that it didn’t take long after people discovered how to use the extension mechanism built into gnome-shell for people to start using it to implement … GNOME 2. Gawd.

Seeking that certainly is not my recommendation; as I wrote above the point of GNOME 3 and it’s new shell is to enable a new mode of interaction. Still, everyone has got their itches and annoyances, and so for my friends who can’t live without their GNOME 2 features, I thought I’d point out a few things.

There are a collections of GNOME Shell Extensions some of which appear to be packaged, i.e. gnome-shell-extensions-drive-menu for an plugin which gives you some kind of menu when removable devices are inserted. I’m not quite sure what the point of that is; the shell already puts something in the tray when you’ve got removable media. Whatever floats your boat, I guess. Out in the wild are a bunch more. The charmingly named GNOME Shell Frippery extensions by Ron Yorston has a bunch of plugins to recreate GNOME 2 features. Most are things I wouldn’t touch with a ten-foot pole (a bottom panel? Who needs it? Yo, hit the Win key to activate the Overview and you see everything!).

My personal itch was wanting to have 4 fixed workspaces. The “Auto Move Workspaces” plugin from gnome-shell-extensions was close (and would be interesting if its experience and UI were properly integrated into the primary shell experience), but the “Static Workspaces” plugin from gnome-shell-frippery did exactly the trick. Now I have four fixed workspaces and I can get to them with Caps + 1 .. Caps + 4. Hurrah.

You install the plugin by dropping the Static_Workspaces@rmy.pobox.com/ directory into ~/.local/share/gnome-shell/extensions/, then restarting the Shell via Alt + F2, R, and then firing up gnome-tweak-tool and activating the extension:

    Advanced Settings → Shell Extension tab → switch "Static Workspaces Extension" to "On"

Hopefully someone will Debian package gnome-shell-frippery soon.

Not quite properly integrated

Having to create custom launchers and fiddle around with plugins just to get things working? “Properly integrated” this ain’t, and that’s my fault. I respect the team hacking on GNOME 3, and I know they’re working hard to create a solid experience. I feel dirty having to poke and tear their work apart. Hopefully over the next few release cycles things like this will be pulled into the core and given the polish and refined experience that have always been what we’ve tried to achieve in GNOME. What would be really brilliant, though, would be a way to capture and export these customizations. Especially launchers; setting that up on new machines is a pain and it’d be lovely to be able to make it happen via a package. Hm.

AfC

Restoring Server to Server Jabber capability in Openfire 3.7.0

We upgraded our Jabber from Openfire 3.6.4 to 3.7.0 but suddenly it wasn’t talking to others who had self-signed certificates (despite having been told that was ok; in such circumstances Jabber offers a “dialback” to weakly establish TLS connectivity). It was breaking with messages like:

 Error trying to connect to remote server: net.au (DNS lookup: net.au:5269)

in the logs. Took a while to isolate that (Java stack traces in server logs, how I hate thee), but that led me to this discussion about the problem.

Turns out to be, more or less, this issue.

Back to the original thread, comment by Gordon Messmer turns the trick; he isolated the specific problem and provided a patch which you can reverse apply. Since I had already downloaded the 3.7.1 nightly build .deb and installed it (which did not fix the problem), but with a patch in hand, I was then able to follow the advice of “Devil” to rebuild the server and replace the openfire.jar manually.

Obviously I will rebuild the .deb in question shortly, but I can confirm that reverting that specific change does restore server to server functionality with people you used to be able to connect to fine.

Meanwhile 3.7.1, alpha though it may be, does seem to have a fair few fixes in it. We’ll keep that for now.

AfC

Mounting a kvm image on a host system

Needed to mount a KVM disk image on a host system without booting the guest. I’ve used kvm-nbd in the past, but that’s more for exporting a filesystem from a remote server (and, in any event, we now use Sheepdog for distributing VM images around).

Found a number of references to using losetup but wanted something simpler; doing the loop part automatically has been a “take for granted” thing for years.

It turns out that if your image is in kvm-img‘s “raw” format already then you can pretty much access it directly. We found this article, dated 2009, which shows you can do it pretty quickly; assuming that the partition is the first (or only one) in the disk image:

# mount -o ro,loop,offset=32256 dwarf.raw /mnt
#

which does work!

I’m a bit unclear about the offset number; where did that come from?

An earlier post mentioned something called kpartx. Given a disk, it will identify partitions and make them available as devices in via the device mapper. Neat. Hadn’t run into that one before.

This comment on Linux Questions suggested using kpartx directly as follows:

# kpartx -a dwarf.raw 
# mount -o ro /dev/mapper/loop0p1 /mnt
#

Nice.

Incidentally, later in that thread is mention of how to calculate offsets using sfdisk -l, however that doesn’t help if you don’t already have the disk available in /dev. But you can use the very same kpartx to get the number of cylinders:

# kpartx -l dwarf.raw 
loop0p1 : 0 16777089 /dev/loop0 63
#

Ah ha; 63 sectors × 512 byte block size is 32256. So now we know where they got the number from; adjust your own mounts accordingly. :)

AfC

Comments:

  1. Pádraig Brady wrote in, saying “I find partx is overkill. I’ve been using the following script for years; lomount.sh dwarf.raw 1 /mnt is how you might use it.” Interesting use of fdisk, there.

Fix your icons package

Updating today, suddenly a whole bunch of icons are broken; GNOME shell was presenting everything with a generic application diamond as its icon. Bah.

I noticed gnome-icon-theme was one of the packages that bumped. It turns out that package is now only a limited subset of the GNOME icon set and that you have to install gnome-icon-theme-full to get the icons you actually need. Bah², but now everything is back the way it should be.

AfC

Importing from pyblosxom to WordPress

Importing to WordPress from [py]blosxom is not as easy as it could be. I ended up writing some PHP to loop over my posts and call wp_insert_post() for each one. Nifty, almost, except that preserving a category hierarchy is a bear.

One massive gotcha: the post bodies are sanitized on import so if you’re using something like Markdown for formatting you get a real mess when it “helpfully” converts > characters to HTML entities, swallows URLs properly enclosed in <>, etc. There is a workaround, thankfully:

#
# The WordPress post function sanitizes all input, and this
# includes escaping bare '>' characters to HTML entities.
# This is most unhelpful when importing Markdown data.
#

    kses_remove_filters();

and then you can safely call insert without your content getting munged:

#
# Construct post and add.
#

    $post = array(
            'post_title' => $title,
            'post_name' => $slug,
            'post_content' => $content,
            'post_category' => array($category),
            'post_date' => $date,
            'post_status' => "publish",
            'post_author' => 2,
    );      

    $result = wp_insert_post($post), true);

    if (is_wp_error($result)) {
            echo "ERROR\n";
            echo print_r($result);
    } else {
            echo "New post ID: $result\n";
    }

etc.

Like I said, it’s a royal pita to figure out the category ID; you need to use the raw ID numbers from the database, not symbolic names or slugs. Ugh.

My blog’s RSS feed is now http://blogs.operationaldynamics.com/andrew/feed. I did my best to preserve post times, guids, etc, but there came a point where it just wasn’t going to get any closer; sorry if you get dups in your reader or planet.

Incidentally, what we all knew as “WordPress MU” is now called a “Network Install”. Go figure, but if you need multi-site aka multi-blog aka multi-user installation you really need to read “Create A Network” for instructions. Do this setup before creating (or importing) content unless you want the joy and bliss of reinstalling several times.

AfC

java-gnome 4.1.1 released

This post is an extract of the release note from the NEWS file which you can read online … or in the sources from Bazaar.


java-gnome 4.1.1 (11 Jul 2011)

To bump or not to bump; that is the question

This is the first release in the 4.1 series. This introduces coverage of the GNOME 3 series of libraries, notably GTK 3.0. There was a fairly significant API change from GTK 2.x to 3.x, and we’ve done our best to accommodate it.

Drawing with Cairo, which you were already doing

The biggest change has to do with drawing; if you have a custom widget (ie, a DrawingArea) then you have to put your Cairo drawing code in a handler for the Widget.Draw signal rather than what used to be Widget.ExposeEvent. Since java-gnome has ever only exposed drawing via Cairo, this change will be transparent to most developers using the library.

Other significant changes include colours: instead of the former Color class there’s now RGBA; you use this in calls in the override...() family instead of modify...() family; for example see Widget’s overrideColor().

Orientation is allowed now

Widgets that had abstract base classes and then concrete horizontal and vertical subclasses can now all be instantiated directly with an Orientable parameter. The most notable example is Box’s <init>() (the idea is to replace VBox and HBox, which upstream is going to do away with). Others are Paned, various Range subclasses such as Scrollbar. Separator, Toolbar, and ProgressBar now implement Orientable as well.

There’s actually a new layout Container, however. Replacing Box and Table is Grid. Grid is optimized for GTK’s new height-for-width geometry management and should be used in preference to other Containers.

The ComboBox API was rearranged somewhat. The text-only type is now ComboBoxText; the former ComboBoxEntry is gone and replaced by a ComboBox property. This is somewhat counter-intuitive since the behaviour of the Widget is so dramatically different when in this mode (ie, it looks like a ComboBoxEntry; funny, that).

Other improvements

It’s been some months since our last release, and although most of the work has focused on refactoring to present GTK 3.0, there have been numerous other improvements. Cairo in particular has seen some refinement in the area of Pattern and Filter handling thanks to Will Temperley, and coverage of additional TextView and TextTag properties, notably relating to paragraph spacing and padding.

Thanks to Kenneth Prugh, Serkan Kaba, and Guillaume Mazoyer for their help porting java-gnome to GNOME 3.


You can download java-gnome’s sources from ftp.gnome.org, or easily checkout a branch frommainline:

$ bzr checkout bzr://research.operationaldynamics.com/bzr/java-gnome/mainline java-gnome

though if you’re going to do that you’re best off following the instructions in the HACKING guidelines.

AfC

java-gnome 4.0.20 released

This post is an extract of the release note from the NEWS file which you can read online.


java-gnome 4.0.20 (11 Jul 2011)

This will be the last release in the 4.0 series. It is meant only as an aide to porting over the API bump between 4.0 and 4.1; if your code builds against 4.0.20 without reference to any deprecated classes or methods then you can be fairly certain it will build against 4.1.1 when you finally get a system that has GTK 3.0 and the other GNOME 3 libraries on it.


AfC

Using tinc VPN

We’ve been doing some work where we really needed “direct” machine to machine access between an number of staff and their local file servers. The obvious way to approach this sort of thing is to use a Virtual Private Network technology, but which one?

There are a lot of VPN solutions out there. Quite a number of proprietary ones, and of course the usual contingent of “it’s-free-except-that-then-you-have-to-pay-for-it”. In both cases, why anyone would trust the integrity of code they can’t review is quite beyond me.

We’ve used OpenVPN for some of our enterprise clients, and it’s quite robust. Its model excels at giving remote users access to resources on the corporate network. Technically it is implemented by each user getting a point-to-point connection on an internal network (something along the lines of a 10.0.1.0/30) between the user’s remote machine and a gateway server, and then adding routes to the client’s system to the corporate IP range (ie good old 192.168.1.0/24). That’s fine so long as the assumption is that all the servers on the corporate network have the gateway as their default route, then reply packets to 10.0.1.2 or whatever will just go do default and be sent back down the rabbit hole. Gets messy with things like Postgres if your remote developers need access to the databases; in the configs you do need to add eg 10.0.1.0/24 to the list of networks that the database will accept connections from.

Anyway, that’s all fairly reasonable, and you can set up the client side from NetworkManager (install Debian package network-manager-openvpn-gnome) which is really important too. Makes a good remote access solution.

Peer to Peer

But for our current work, we needed something less centralized. We’re not trying to grant connectivity to a remote corporate network; we’re trying to set up a private network in the old-fashioned frame-relay sense of the word — actually join several remote networks together.

Traditional VPN solutions route all the traffic through the secure central node. If you’ve got one system in NSW and another in Victoria, but the remote access gateway is in California, then despite the fact that the two edges are likely less than 50 ms away direct path, all your traffic is going across the Pacific and back. That’s stupid.

A major complication for all of us was that everyone is (of course) stuck behind NAT. Lots of developers, all working remotely, really don’t need to send all their screen casts, voice conferences, and file transfer traffic into the central corporate network just to come all the way out again.

The 1990s approach to NAT implies a central point that everyone converges to as a means of getting their packets across the port address translation boundary. Things have come a long way since then; the rise of peer-to-peer file sharing and dealing with the challenges of internet telephony has also helped a great deal. Firewalls are more supportive and protocols have evolved in the ongoing attempt to deal with the problem.

Meet tinc

So the landscape is different today, and tinc takes advantage of this. According to their goals page, tinc is a “secure, scalable, stable and reliable, easy to configure, and flexible” peer-to-peer VPN. Uh huh. Because of its peer-to-peer nature, once two edges become aware of each other and have exchanged credentials, they can start sending traffic directly to each other rather than through the intermediary.

$ ping 172.16.50.2
PING 172.16.50.2 (172.16.50.2) 56(84) bytes of data.
64 bytes from 172.16.50.2: icmp_req=1 ttl=64 time=374 ms
64 bytes from 172.16.50.2: icmp_req=2 ttl=64 time=179 ms
64 bytes from 172.16.50.2: icmp_req=3 ttl=64 time=202 ms
64 bytes from 172.16.50.2: icmp_req=4 ttl=64 time=41.6 ms
64 bytes from 172.16.50.2: icmp_req=5 ttl=64 time=45.4 ms
64 bytes from 172.16.50.2: icmp_req=6 ttl=64 time=51.3 ms
64 bytes from 172.16.50.2: icmp_req=7 ttl=64 time=43.3 ms
64 bytes from 172.16.50.2: icmp_req=8 ttl=64 time=42.3 ms
64 bytes from 172.16.50.2: icmp_req=9 ttl=64 time=44.2 ms
...
$

This is with the tincd daemons freshly restarted on each endpoint. The first packet clearly initiates edge discovery, key exchange, and setup of the tunnels. It, and the next two packets, are passed across the Pacific to the central node. Ok, fine. But after that, the tunnel setup completes, and both edge nodes have been informed of the peer’s network addresses and start communicating directly. Nice.

See for yourself

Watching the logs under the hood confirms this. If you run tincd in the foreground then you can specify a debug level on the command line; I find “3″ a good setting for testing:

# tincd -n private -D -d3
tincd 1.0.13 (May 16 2010 21:09:47) starting, debug level 3
/dev/net/tun is a Linux tun/tap device (tun mode)
Executing script tinc-up
Listening on 0.0.0.0 port 655
Ready
Trying to connect to einstein (1.2.3.4 port 655)
Trying to connect to newton (5.6.7.8 port 655)
...

If you give it SIGINT by pressing Ctrl+C then it’ll switch itself up to the exceedingly verbose debug level 5, which is rather cool. SIGQUIT terminates, which you can send with Ctrl+. If you’re not running in the foreground (which of course you’d only be doing in testing),

# tincd -n private -kINT

does the trick. Quite handy, actually.

Performance is respectable indeed; copying a 2.8 MB file across the Pacific,

$ scp video.mpeg joe@einstein.sfo.example.com:/var/tmp

gave an average of 31.625 seconds over a number of runs. Doing the same copy but sending it over the secure tunnel by addressing the remote machine by its private address,

$ scp video.mpeg joe@172.16.50.1:/var/tmp

came in at an average of 32.525 seconds. Call it 3% overhead; that’s certainly tolerable.

Setup

Despite my talking above about joining remote networks, an important and common subcase is merely joining various remote machines especially when those machines are both behind NAT boundaries. That’s our in-house use case.

The tinc documentation is fairly comprehensive, and there are a few HOWTOs out there. There are a few gotchas, though, so without a whole lot of elaboration I wanted to post some sample config files to make it easier for you to bootstrap if you’re interested in trying this (install Debian package tinc).

tinc has a notion of network names; you can (and should) organize your files under one such. For this post I’ve labelled it the incredibly original “private“. Note that when you specify host names here they are not DNS hostnames; they are just symbolic names for use in control signalling between the tinc deaemons. Flexibility = Complexity. What else is new. Obviously you’d probably use hostnames anyway but administration of the tinc network doesn’t need to be co-ordinated with people naming their laptop my-fluffy-bunny or some damn thing. Anyway, on system labelled hawking I have:

hawking:/etc/tinc/private/tinc.conf

    Name = hawking
    AddressFamily = ipv4
    ConnectTo = einstein
    ConnectTo = newton
    Interface = tun0

Note that I’ve got an Interface statement there, not a Device one. That’s a bit add odds with what the documentation said but what I needed to make it all work. Only one ConnectTo is actually necessary, but I’ve got one server in California that is reliably up and one in Victoria that is not so I just threw both in there. That’s what your tincd is going to (compulsively) try to establish tunnels to.

hawking:/etc/tinc/private/hosts/hawking

    Subnet = 172.16.50.31/32

Somewhat confusingly, you need a “hosts” entry for yourself. Above is what you start with. Each host also needs a keypair which you can generate with:

# tincd -n private -K4096

with /etc/tinc/private/rsa_key.priv getting the private key and the public key being appended to the hosts/hawking file:

    Subnet = 172.16.50.31/32

    -----BEGIN RSA PUBLIC KEY-----
    MIICCgKCAgEAzSd5V91X6r3NB3Syh2FV8/JC2M7cx6o2OKbVzP6X5SFPI1lEH1AD
    7SfIlQF4TE++X8RcpJaBi4KjMS/Ul36Tuk75eKA18aNTBoVqH/ytY0BipQvJ6TUd
    BEkCjYrOUHFYOQn8MxQzziG6nk9tvhTWS0yKCNbd68e5i9uyKOem3R/pJsd/Kh9V
    wdVB51Wxs1Sv07OYmGYyRmGWh450wBNEmQfPHmM60Yh6uoQNJ0Ef41k1ZcswWcfO
    0jp9EOvbW/ZCdBW6teIYZ3GMuMB/cFj0Dw2fx6dHNHZVZrPcivt0cuOG8L4jNoHj
    HQUGuzMrpDN8N1ymM/eDlx+kBFYreKiEYGoWWqlZPNoY+bCekMrNf6Sr9bBwbj23
    xmY1jf6v1LkxGtOi4wWJfbU4xaMnquIRQe6FtB4LHp29l2SYWcpZnjuLcZ4ZoZLQ
    WK4bb0bUCAI/eYb19JRnfKEwS9MhYaQhZLWAJ3xyOt9u/Kk9KV7vWApxR1f5e2KT
    77A446eQU5aedm8nBDbd+WHqTdklAQ7SdRyYmbD8PoXBd3DGP6dFiURVTy8Wn4gz
    Bn7PMI3zmhfCMtwq/3A/xfyjQY3qesGCmKUwTno3fhv1DScS0rS9TRxZfyxlaOB1
    qjtlU79VhI0UKlha2Fv4XLshQ5dYEutpatpij0NzPYlwiQFphFQKStsCAwEAAQ==
    -----END RSA PUBLIC KEY-----

These are the public identifiers of your system and indeed the remote system in your ConnectTo statement must have a copy of this in its hosts/ directory. For nets of servers we maintain them in Bazaar and share them around using Puppet. Central distribution brings its own vulnerabilities and hassles; for very small team nets we just share around a tarball :).

You don’t need the /32, it turns out, but I left it in here to show you that tincd is effectively trading around network route advertisements, not host address.

hawking:/etc/tinc/private/tinc-up

    #!/bin/sh
    ifconfig $INTERFACE 172.16.50.31 netmask 255.255.255.0

This gets run when the net comes up. You can do all kinds of interesting things here, but the really magic part is assigning a broader /24 network mask than that given the interface in the hosts/hawking file. That means this interface is the route to the network as a whole (not just to a single-attached host on the other side of a point-to-point tunnel, which is what OpenVPN does, leaving the default gateway to sort it all out). Lots of other ways to wire it of course, but one /24 in RFC 1918 land is more than enough. I’ve even heard of some people using Avahi link-local networking to do the addressing.

I could have hard coded tun0 there, I suppose, but they supply some environment variables. Much better.

Now for the California node:

einstein:/etc/tinc/private/tinc.conf

    Name = einstein
    AddressFamily = ipv4
    Interface = tun0
    Device = /dev/net/tun

That one I did need a Device entry. Not sure what’s up there; it’s a server running Stable, so could just be older kernel interfaces. Doesn’t matter.

Note again though that the tinc.conf file doesn’t have a public IP in it or anything. Bit unexpected, but hey. It turns up in the hosts files:

einstein:/etc/tinc/private/hosts/einstein

    Address = 1.2.3.4
    Subnet = 172.16.50.1/32

    -----BEGIN RSA PUBLIC KEY-----
    MIICCgKCAgEAqh/4Pmxy5fXZh/O7NkvebFK0OP+YD8Ph7JvK8RsUn75FY3DXjCCg
    VNRR+kRhnVoKVJcIAuvW7Tbs4fovWELOJbbUbKea8G+HANCgOY5F0rkJVtIAcTCL
    Jg1OelAfhF6yHV4vVgcawafWiMF2CtprveHomCnOwCbGuTDwTBqaUBZ9IOLzU2bx
    ArVA2No9Ks+xaaeSHejYoii3+WT58HUccntmIYkcdBa0uKZSis1XLUwdT7Evr1Ew
    K54RyMMEPC0MUziYZhAA0Qqpz79EzLXAGgQeuFxLjPoW/NbAD0PEBmsdmI5odprp
    t9Tx11v/UuhK2fszYKjM+DF2pYxxrKlOyus58zx5KKJQjjrzazrru5Ny0DNf/E6Y
    uB2kUtt7TCmoZg2CLAbIkyGJEiK+Wy2x2mabGDgicIs422XVslz2EODSI3qqF+f6
    gu+h/vYvjZxglYrL0SxTRV7wkUc+o9OVXMMYPazgPIkwnBeLrEhGL8GS4wDIYu4G
    E89m9UBE0fhVPJyw4QSfdeJZ4PgpJk6SG/7koVsJqr9EZOLp53K7ipnPylUKaRLD
    mcarvoDO6ybCuHUVUsLuzZZStSG8JEEe/8jb/Ex7UNBzJ14Nglqtu0aUZ/tzkrdS
    nPFFhdIwlUctM7sWKVfBugEkWjs3sR+XRVsCjxMrpZX0lXzcw9vhu60CAwEAAQ==
    -----END RSA PUBLIC KEY-----

This file must be on every system in the net (that has a ConnectTo it) — it’s how the edges know where to call. So the same file is copied to hawking:

hawking:/etc/tinc/private/hosts/einstein

    Address = 1.2.3.4
    Subnet = 172.16.50.1/32

    -----BEGIN RSA PUBLIC KEY-----
    MIICCgKCAgEAqh/4Pmxy5fXZh/O7NkvebFK0OP+YD8Ph7JvK8RsUn75FY3DXjCCg
    VNRR+kRhnVoKVJcIAuvW7Tbs4fovWELOJbbUbKea8G+HANCgOY5F0rkJVtIAcTCL
    Jg1OelAfhF6yHV4vVgcawafWiMF2CtprveHomCnOwCbGuTDwTBqaUBZ9IOLzU2bx
    ArVA2No9Ks+xaaeSHejYoii3+WT58HUccntmIYkcdBa0uKZSis1XLUwdT7Evr1Ew
    K54RyMMEPC0MUziYZhAA0Qqpz79EzLXAGgQeuFxLjPoW/NbAD0PEBmsdmI5odprp
    t9Tx11v/UuhK2fszYKjM+DF2pYxxrKlOyus58zx5KKJQjjrzazrru5Ny0DNf/E6Y
    uB2kUtt7TCmoZg2CLAbIkyGJEiK+Wy2x2mabGDgicIs422XVslz2EODSI3qqF+f6
    gu+h/vYvjZxglYrL0SxTRV7wkUc+o9OVXMMYPazgPIkwnBeLrEhGL8GS4wDIYu4G
    E89m9UBE0fhVPJyw4QSfdeJZ4PgpJk6SG/7koVsJqr9EZOLp53K7ipnPylUKaRLD
    mcarvoDO6ybCuHUVUsLuzZZStSG8JEEe/8jb/Ex7UNBzJ14Nglqtu0aUZ/tzkrdS
    nPFFhdIwlUctM7sWKVfBugEkWjs3sR+XRVsCjxMrpZX0lXzcw9vhu60CAwEAAQ==
    -----END RSA PUBLIC KEY-----

Ok, you get the idea with the public keys, but I wanted to emphasize the point it’s the same file. This is what you need to share around to establish the trust relationship and to tell E.T. where to phone home.

The Address entry in the hosts/einstein files spread around is what tells edge nodes which have been configured to ConnectTo to einstein where the real public IP address is. You can use DNS names here, and could play dynamic DNS games if you have to (sure, further decentralizing, but). If you have a few machines capable of being full time central supernodes then you’ll have much better resiliency.

You do not, however, need to share a hosts/ file for every other node on the net! If laptop penrose is already connected in to einstein and has been assigned 172.16.50.142 say, and hawking joins einstein and tries to ping .142, the central node einstein will facilitate a key exchange even though neither hawking nor penrose have each others’ keys, and then get out of the way. Awesome.

And finally, this all works over further distributed topologies. When new nodes join the new edges and their subnets are advertised around to the rest of the net. So if central nodes einstein and curie are already talking, and sakharov joins currie, then traffic from our hawking will reach sakharov via eintstein and currie, and in fairly short order they will have handled key exchange, step out of the way, and hawking will be communicating with sakharov direct peer to peer. Brilliant.

Nothing stopping you from share around (or centrally managing out-of-band) the hosts/ files with the Subnet declarations and the public keys, of course; it’ll save a few round trips during initial key exchange. Up to you how you manage the trust relationships and initial key distribution.

For completeness,

einstein:/etc/tinc/private/tinc-up

    #!/bin/sh
    ifconfig $INTERFACE 172.16.50.1 netmask 255.255.255.0

No surprises there.

Applications

Using tinc to cross arbitrary NAT boundaries has turned out to be supremely useful. I have successfully used this from within my office, over 3G UTMS mobile broadband, at internet cafes around Australia, in airport lounges in the States, and even from beach-side resorts in Thailand. In all cases I was able to join the private network topology. In fact, I now just leave tincd running as a system daemon on my laptop. When I need to talk to one of the file servers, I ping, and it’s there.

One surprising benefit was in getting voice-over-Jabber running again. We had some horrible regressions with audio quality during the Maverick release series of Ubuntu Linux. At one point in our diagnostics we found that the STUN algorithms for local and remote candidate IP detection were preferentially choosing localhost virtual bridges with lower route metrics than the default gateway resulting in routing loops. We brought up tinc and since both parties were on 172.16.50.x, Empathy and Jingle chose those as the “best” network choice. Packet loss problems vanished and the audio quality really improved (it didn’t finally get resolved until we got a Natty base system, tore out the Unity stuff, and got GNOME 3 and Empathy 3.0 on board via ppa:gnome3-team/gnome3 but that’s a separate issue). And as a side-effect we’ve got some ice on our voice channel. Excellent.

I’ve since read about a number of other interesting applications. A frequent use case is not needing encryption. While most people would interpret the “private” in virtual private network as meaning “secure”, in the old days it just meant a custom routing and network topology layered over whatever the underlying physical transport was. One crew running a large farm of servers on cloud provided infrastructure struggled to enable their various distributed nodes to find and talk to each other. So they disabled the encryption layer but used tinc as a means to facilitate do IP-over-IP tunnelling and giving their sys admins a stable set of (private) addresses with which to talk to the machines. They gave a talk at FOSDEM [their slides here] about it.

Also at FOSDEM was a talk by the “Fair VPN” effort, who are looking at improving efficiency of the network when the number of nodes scales into the thousands. Some nodes are “closer” than others so presumably they should be used preferentially; you don’t really need to discover information about every other node in the network on joining, and so on. The fact that they were able to use tinc as a research platform for this is fascinating and a nice kudo.

Next steps

So I’m pretty pleased with tinc, obviously. We’ve had a very positive experience, and I wanted to put a word in. If you’re involved in network engineering or security hardening, then I’m sure they’d welcome your interest.

It would be outstandingly cool if we could work out a NetworkManager plugin to set this up on demand, but that can wait for tinc version 1.1 or 2.0. I gather they’re working on making the key exchange and configuration easier; what I showed above is obviously well thought out and flexible, but there’s no denying it’s a bit cumbersome; there are a fair number of little knobs that need to be just right. A fire-and-forget daemon cross-product with some form of automatic addressing would be brilliant. But on the other hand, when you put network and security in the same sentence there’s a limit to how much you want to happen without any direct influence over the process. As it stands now tinc strikes a good balance there, and is entirely suitable for an environment managed by properly competent sysadmins.

AfC

Updates

  1. Turns out I was wrong about needing the Interface statement. After Dan’s post I tried it without one and tincd worked fine. Then I remembered why I’d done it that way — without an Interface statement the network interface was named for the tinc net label, private in this case. Preferring tun0, I went back to manually forcing it for my own personal aesthetic benefit.

Force Pidgin online

The Network Mananger 0.9 series has made some changes which break current Pidgin. After I installed network-manager 0.8.999 Pidgin won’t connect, stalling with “waiting for network connection”.

Turns out there is a workaround in Pidgin: you can force it to ignore what it thinks network availability by running it as:

$ pidgin -f

There’s no GUI way in gnome-shell to edit a launcher at the moment, fine; The old “edit menus” trick didn’t seem to work either. So to do that manually:

$ cp /usr/share/applications/pidgin.desktop .local/share/applications/
$ vi .local/share/applications/pidgin.desktop

And change the Exec line to:

Exec=pidgin -f

It won’t take effect until the desktop recaches things. Reload gnome-shell by typing “r” in the Alt+F2 run dialog and you’ll be on your way.

I’m sure upstream will catch up with the Network Manager changes in due course but I can live without network availability detection for now and this gets me back online.

I’ve been using Empathy for instant messaging for a long time, but I still love Pidgin for IRC. Go figure. So two clients it is.

AfC


Material on this site copyright © 2002-2014 Operational Dynamics Consulting Pty Ltd, unless otherwise noted. All rights reserved. Not for redistribution or attribution without permission in writing. All times UTC

We make this service available to our staff and colleagues in order to promote the discourse of ideas especially as relates to the development of Open Source worldwide. Blog entries on this site, however, are the musings of the authors as individuals and do not represent the views of Operational Dynamics