The choice of a fix?

As any Open Source enthusiast knows, our ecosystem is built using layers — there’s the kernel, the platform then the application, each of these serve clearly different purposes and usually, parts at the bottom (the kernel) expose required parts of themselves to things further up the stack.

This, of course provides different levels of tuning and optimisation — kernels have the ability to use /proc or sysfs to allow userspace tuning — the GNOME platform has things like dconf, gconf2 and gsettings to allow programs like gnome-tweak-tool to function for “power” users, as well as the standard control panel for “normal” users — and of course, individual programs have the ability to customise settings via use of the Edit / Preferences menu.

TCP/IP, as part of this ecosystem — is no exception. Of course, there are numerous examples of how to configure the TCP/IP stack, from academia, research departments, distributions, systems integrators and individuals. Most, if not all of these pages discuss using the sysctl program to control the /proc infrastructure in order to make changes to the TCP/IP stack — this is the way it should be.

All of this, is a long-winded lead-in to a seemingly innocuous issue I discovered while working with one of our teams on some mobile kernel analysis work recently — and following my last post here, made me wonder:

Why?

When looking at an n x k x t space when comparing kernels, one-line changes which occur in a single tree usually stand out like a “deer in the headlights” and this one typified the sort of question we all eventually had in the end.

Consider:

--- a/net/ipv4/tcp_output.c      2011-10-04 00:00:00.000000000 +0000
+++ b/net/ipv4/tcp_output.c      2011-03-25 00:00:00.000000000 +0000
@@ -243,6 +243,8 @@ void tcp_select_initial_window(int __spa
else if (*rcv_wnd > init_cwnd * mss)
*rcv_wnd = init_cwnd * mss;
}
+        /* Lock the initial TCP window size to 64K*/
+        *rcv_wnd = 64240;

/* Set the clamp no higher than max representable value */
(*window_clamp) = min(65535U << (*rcv_wscale), *window_clamp);

Once again, Why?

Especially when:

/sbin/sysctl -w net.ipv4.tcp_rmem = 64240 64240 [MAX]

and:

/sbin/sysctl -w net.ipv4.tcp_wmem = 64240 64240 [MAX]

(Where tcp_{r/w}mem takes the minimum, default and maximum values respectively.)

From userspace (for example, in the init.rc file) or from patching the Android sysctl.conf file — would have done the same thing the code above does but would have allowed tuning by product teams if required.

A good quote I saw during the course of my investigations into the Why question, was:

There is an argument for not applying any optimisation to the TCP connection between a webserver and a mobile phone. The features of a TCP connection can only be negotiated when the initial connection is established. If you move from an area with 3G reception to an area only offering 2.5G, or vice versa, the optimisations you may have done for the 3G connection may cause terrible performance on the 2.5G connection, due to the differences in the network characteristics. Assuming that a connection will always be on the same type of network technology means that you could fall into the pitfall of premature optimisation.

Could such a fix have been done because the production team once again had an internal testing issue that was resolved by doing such a thing, was it a case of it being the easiest way of doing interoperation with difficult client operating systems such as Windows XP, or was it done for another reason?

Indeed, it may have been done because often the kernel team in these situations is a completely different entity to those creating the platform — and asking the platform team to ‘tweak‘ these settings may have been more difficult than making a one-liner fix in the kernel.

Also, from viewing this one-line change, we do not know:

  • If Selective Acknowledgment (SACK) was enabled as part of this vendors platform code? (Reading of most GPRS optimisation guides available on the web, including RFC 3481 suggest it should be).
  • If TCP/IP Window Scaling (RFC 1323) was switched on and supported by default.
  • If TCP ECN (RFC 3168) was switched on and supported by default.
  • If the Cell Towers (that actually do the grunt of the work and are worldwide) as well as the intermediary networks that these particular devices exist on (which indeed, the vendor has no control over at all) have TCP/IP Header Compression (RFC 1144) turned OFF.

as well as a number of other things about this device and it’s networking functionality.

The wrong part of the stack to “fix” this type of issue though, for sure — and we can tell that, not just because we’ve (that’s the Operational Dynamics “we” here) got experience with the way this type of thing should be fixed, but because from our n x k x t space equation from before, no other team (even within the same organisation) chose to fix their particular device the same way.

The more I look at it, the more I think the use of the Think-aloud Protocol in “external” kernel development, would be an interesting thing to investigate.

A tale of two mobile device TTYs.

Recently, i’ve been tinkering with mobile phone code — specifically Android-based mobile code, it’s had a dual purpose, but it’s also had an enlightening effect on exactly why homebrew mobile modding (such as the very well constructed Cyanogen mod) communities actually exist.

One of the alterations I ran across while attempting to unbrick a friends HTC-based mobile phone (who, is not a technical person, but their warranty had expired and a failed ‘Software Upgrade’ on a Win32 based workstation, which had caused their phone to cease being a phone) was very interesting.

Consider the following code from HTC’s own htc_headset_mgr.c file (~ line 775 depending on your Android Kernel Revision):

static DEVICE_ACCESSORY_ATTR(tty, 0666, tty_flag_show, tty_flag_store);

htc_headset_mgr.c, for want of a better description, is the manager code for such devices as a wired headset, the code it controls (specifically, htc_headset.c) does not set permissions for the sysfs files or device nodes it creates — so presumably, that’s what this code does.

It should be noted the case of world-writable sysfs in device code is hardly new, nor is HTC the only vendor with these types of issues and there’s ongoing attempts to try and fix this from members of the Openwall project and others, some more successfully than others — but with the gambit of drivers, some using DAC, some using capabilities and some using neither — that’s hardly surprising.

This driver is one that uses neither, presumably because it was written without consideration for ever appearing outside the HTC tree (ie. upstream) or because the developers didn’t consider capabilities being worth their time to implement.

Cyanogen, fix this with:

static DEVICE_ACCESSORY_ATTR(tty, 0644, tty_flag_show, tty_flag_store);

Which fixes the obvious incorrectness, but the question remains:

Why?

  • Why would the headset require rw+ access for the allocated TTY (and presumably, any other driver this management module pertains to) in the first place.
  • Why does a development team, potentially shipping millions of devices featuring this device worldwide *not* implement _basic_ capabilities handling, or, failing that — at least some form of DAC in their code.
  • Why does a homebrew modding community take the time to find and “fix” this.
  • Why does the upstream production team not apply these changes, after they’ve been found in the wild by another community.

It may have been an oversight, it may have been a hack to get a testcase to pass internally that was never corrected, it may have just been a production team’s time-crunch deadline (every development team has those, after all) or it could be a lack of suitable training and information?

Still, it did make me wonder exactly which other drivers are being shipped with consumer electronics that have the same issues — and, if all this stuff is being done outside sanctioned upstream trees — the kind of issues those of us looking in from the outside are just “not privileged enough” to see.

Headers, and Lions, and Tigers, and Explorers (oh my.)

It seems a lot of my time of late is spent debugging, testing and fixing web technologies — mostly from a security perspective, or a performance one — but, occasionally, it’s more involved helping our people fix issues at runtime in the wild.

So, when a Redmine installation starts looking miserable from a site that looks bad in Internet Explorer — and we debug it to the point we can suggestively point out it’s the “Compatibility Mode” of IE, how do we fix this?

… and moreover, how do we fix this! (without the client needing to edit code to add the oft-used X-UA-Compatible meta string everywhere, in other words.)

It’s not that it’s hard, in reality — it just requires a little thinking from left field ;)

Sites like StackOverflow have a few examples on how to do this, but i’d like to suggest a better one which uses the setenvif and headers modules and a neat little tweak to make things ‘just work(tm)’ on earlier browsers and is both .htaccess and vhost compatible.

<IfModule mod_setenvif.c>
<IfModule mod_headers.c>
BrowserMatch \bMSIE\s[89] good-versions
BrowserMatch \bMSIE\s[67] bad-versions
Header set X-UA-Compatible "IE=IE9,IE=8" env=good-versions
Header set X-UA-Compatible "IE=EmulateIE7,chrome=1" env=bad-versions
</IfModule>
</IfModule>

Which says:

  • The BrowserMatch lines match browser user-agent lines that use IE9 or IE8, followed by IE7 and IE6 respectively and applies the environment variable good/bad-versions to them respectively.
  • The X-UA-Compatible header is set, then applied to those variables, with a degrading-based version numbering (ie. you apply from the highest supported version of IE to the lowest in that order.
  • Finally, the bad-versions line also applies Google Chrome Frame to the end of it’s line, so browsers like IE6 are asked if they would like to use that, before your code resorts to browser-based CSS hacks and other IE-related workarounds.

The list of X-UA-Compatible tags are listed here.

If you use support for IE 10 / Windows 8 Development Editions, feel free to add something like:

BrowserMatch \bMSIE10 new-versions
Header set X-UA-Compatible "IE=Edge" env=new-versions

To your Apache configuration / .htaccess file in the relevant spots — that should keep you covered for future versions of the code :)

One Hosed Apache Configuration, One Little Fix.

note: This is mainly here so the next time something happens to my Apache Installation :)

The other night, in the very, very, very early morning — one of the Apache servers I look after had a small issue with fsck and the /etc/apache2/modules-* directories became rather hosed.

After pulling up all the backups and applying the most recent web code, chmod‘ing the results and firing a service apache restart, I received:

access to client_directory failed, reason: require directives present and no Authoritative handler

The solution:

# a2enmod mod_authz_user

If, for some reason your installation doesn’t have the a2enmod software:

# ln -s /etc/apache2/sites-available/authz_user.load /etc/apache2/sites-enabled/authz_user.load

Works equally as well.