The choice of a fix?

As any Open Source enthusiast knows, our ecosystem is built using layers — there’s the kernel, the platform then the application, each of these serve clearly different purposes and usually, parts at the bottom (the kernel) expose required parts of themselves to things further up the stack.

This, of course provides different levels of tuning and optimisation — kernels have the ability to use /proc or sysfs to allow userspace tuning — the GNOME platform has things like dconf, gconf2 and gsettings to allow programs like gnome-tweak-tool to function for “power” users, as well as the standard control panel for “normal” users — and of course, individual programs have the ability to customise settings via use of the Edit / Preferences menu.

TCP/IP, as part of this ecosystem — is no exception. Of course, there are numerous examples of how to configure the TCP/IP stack, from academia, research departments, distributions, systems integrators and individuals. Most, if not all of these pages discuss using the sysctl program to control the /proc infrastructure in order to make changes to the TCP/IP stack — this is the way it should be.

All of this, is a long-winded lead-in to a seemingly innocuous issue I discovered while working with one of our teams on some mobile kernel analysis work recently — and following my last post here, made me wonder:

Why?

When looking at an n x k x t space when comparing kernels, one-line changes which occur in a single tree usually stand out like a “deer in the headlights” and this one typified the sort of question we all eventually had in the end.

Consider:

--- a/net/ipv4/tcp_output.c      2011-10-04 00:00:00.000000000 +0000
+++ b/net/ipv4/tcp_output.c      2011-03-25 00:00:00.000000000 +0000
@@ -243,6 +243,8 @@ void tcp_select_initial_window(int __spa
else if (*rcv_wnd > init_cwnd * mss)
*rcv_wnd = init_cwnd * mss;
}
+        /* Lock the initial TCP window size to 64K*/
+        *rcv_wnd = 64240;

/* Set the clamp no higher than max representable value */
(*window_clamp) = min(65535U << (*rcv_wscale), *window_clamp);

Once again, Why?

Especially when:

/sbin/sysctl -w net.ipv4.tcp_rmem = 64240 64240 [MAX]

and:

/sbin/sysctl -w net.ipv4.tcp_wmem = 64240 64240 [MAX]

(Where tcp_{r/w}mem takes the minimum, default and maximum values respectively.)

From userspace (for example, in the init.rc file) or from patching the Android sysctl.conf file — would have done the same thing the code above does but would have allowed tuning by product teams if required.

A good quote I saw during the course of my investigations into the Why question, was:

There is an argument for not applying any optimisation to the TCP connection between a webserver and a mobile phone. The features of a TCP connection can only be negotiated when the initial connection is established. If you move from an area with 3G reception to an area only offering 2.5G, or vice versa, the optimisations you may have done for the 3G connection may cause terrible performance on the 2.5G connection, due to the differences in the network characteristics. Assuming that a connection will always be on the same type of network technology means that you could fall into the pitfall of premature optimisation.

Could such a fix have been done because the production team once again had an internal testing issue that was resolved by doing such a thing, was it a case of it being the easiest way of doing interoperation with difficult client operating systems such as Windows XP, or was it done for another reason?

Indeed, it may have been done because often the kernel team in these situations is a completely different entity to those creating the platform — and asking the platform team to ‘tweak‘ these settings may have been more difficult than making a one-liner fix in the kernel.

Also, from viewing this one-line change, we do not know:

  • If Selective Acknowledgment (SACK) was enabled as part of this vendors platform code? (Reading of most GPRS optimisation guides available on the web, including RFC 3481 suggest it should be).
  • If TCP/IP Window Scaling (RFC 1323) was switched on and supported by default.
  • If TCP ECN (RFC 3168) was switched on and supported by default.
  • If the Cell Towers (that actually do the grunt of the work and are worldwide) as well as the intermediary networks that these particular devices exist on (which indeed, the vendor has no control over at all) have TCP/IP Header Compression (RFC 1144) turned OFF.

as well as a number of other things about this device and it’s networking functionality.

The wrong part of the stack to “fix” this type of issue though, for sure — and we can tell that, not just because we’ve (that’s the Operational Dynamics “we” here) got experience with the way this type of thing should be fixed, but because from our n x k x t space equation from before, no other team (even within the same organisation) chose to fix their particular device the same way.

The more I look at it, the more I think the use of the Think-aloud Protocol in “external” kernel development, would be an interesting thing to investigate.