hackergotchi
Of white hats and black hats

Paul Drain, is a security professional with considerable experience reviewing patching and packaging the Linux kernel for deployment, having done so for Red Hat for many years. He specializes in comprehension of unknown code and troubleshooting deployment problems.

Contact...

Twitter @onepercentfunk

Google Plus +Paul Drain

Email 0x691A36C8

RSS Feed /paul

DDRescue Survival Mode

note: This post is more for my own future reference than anything else, but I figure it might help others out in a jam, so i’d post it here — Paul.

Recently, I was asked to attempt to recover an NTFS based drive that had developed “Click Of Death” — in a laptop that moves around a bit, such a thing is not uncommon, but I always forget the lines that ddrescue that work ‘most reliably’ for me when i’m on a remote machine, so i’m documenting them for completeness.

Firstly, back up the MBR / Partition Table (really, really useful on NTFS based machines that fail)

dd if=/dev/sdX of=/media/working-drive/mbr.code bs=512 count=1

Then, presuming the destination drive is as, or is larger than the source one, run:

ddrescue --no-split /dev/sdX /media/working-drive/backup_cdrive.img  /media/working-drive/backup_cdrive.log 

ddrescue --direct --preallocate --max-retries=9 /dev/sdX /media/working-drive/backup_cdrive.img  /media/working-drive/backup_cdrive.log

ddrescue --direct --preallocate --retrim --max-retries=9 /dev/sdX /media/working-drive/backup_cdrive.img  /media/working-drive/backup_cdrive.log

Then, when you’ve checked your images for bugs with a tool like ‘testdisk’ or’sectrecover’ or any commerical based one you may have on hand, the recovery process is:

  1. Partition the new drive.
  2. Restore Images
  3. Run: dd if=/media/working-drive/mbr.code of=/dev/sdY bs=446 count=1

“Restore Images” in this case, can be:

  1. something physical, like: e2fsck -f /media/working-drive/backup_cdrive.img && dd if=/media/working-drive/backup_cdrive.img of=/dev/sdY[1-100]
  2. or, something virtual, like: e2fsck -f /media/working-drive/backup_cdrive.img && VBoxManage convertfromraw backup_cdrive.img backup_cdrive.vdi –format VDI

(and, for those wondering — the 446 byte copy is due to the fact the new drive is probably not the same as the old one, so we’ll do the partitioning manually and only recover the MBR code, not the whole lot — which is a 446 byte MBR, a 64 byte partition table and a 2 byte signature block.)

HTML Formatting, Blockquotes, Paragraphs & You

Since i’ve been blogging here, one thing has continually frustrated me about the WordPress interface — the fact blockquotes and code tags in the editor will always, automatically put a br tag in, making formatting code, HTML fragments and other configuration examples rather annoying to post.

So, I went looking for a solution — as most of my posts here will have code examples :)

As it turns out, the WordPress Codex for wpautop() already has the ability to turn off the function that does this as part of it’s design — and because I didn’t want to get rid of the function altogether, it was easier to craft my own.

So, in functions.php — it’s a case of:

  1. Removing the existing filter.
  2. Adding our own filter that returns ‘false‘ for the $br portion.
  3. Adding our new filter.

Which looks like:

remove_filter( 'the_content', 'wpautop' );
remove_filter( 'the_excerpt', 'wpautop' );

function wpautop_fixed($str) {
 return wpautop($str, false);
}

add_filter( 'the_content', 'wpautop_fixed' );
add_filter( 'the_excerpt', 'wpautop_fixed' );

Problem solved, code lines don’t break anymore — and the amount of extra HTML I have to add to get the standard editor (or indeed, the uber-cool Markdown on Save Improved plugin we use here) is minimised.

Evolution, Databases, Grief.

Recently, Evolution on my Ubuntu Oneiric Desktop popped up with a dialogue stating:

Database Disk Image Is Malformed

Which caused it to not index anything in any of the folders I had listed in my IMAP setup — restarting, using evolution –force-shutdown and various other solutions found on the interschnitzel had no effect, however — a slightly modified version of this page worked a treat.

Slightly modified, as Evolution 3.x and beyond on Ubuntu use ~/.local/share/evolution/mail for their mail storage — so the correct sequence of events to fix this problem became:

sudo apt-get -f install sqlite3

Then:

cd ~/.local/share/evolution/mail
for i in `find . -name folders.db`; do 
echo "Rebuilding Table $i"; 
sqlite3 $i "pragma integrity_check;"; 
done

Which turned:

Rebuilding Table ./imap/paul@recovered-mail/folders.db

*** in database main ***

On tree page 11 cell 0: 2nd reference to page 173

On tree page 11 cell 1: 2nd reference to page 174

On tree page 11 cell 2: 2nd reference to page 450

On tree page 11 cell 3: 2nd reference to page 711

On tree page 11 cell 4: 2nd reference to page 924

On tree page 1060 cell 0: 2nd reference to page 805

On tree page 1060 cell 1: 2nd reference to page 849

On tree page 1060 cell 2: 2nd reference to page 921

On tree page 1060 cell 3: 2nd reference to page 851

On tree page 1060 cell 4: 2nd reference to page 911

On tree page 1060 cell 5: 2nd reference to page 850

On tree page 1060 cell 6: 2nd reference to page 848

Page 1067: btreeInitPage() returns error code 7

Page 1069: btreeInitPage() returns error code 11

Error: database disk image is malformed

Into:

Rebuilding Table ./imap/paul@recovered-mail/folders.db

ok

Of course, one needs to make sure the databases aren’t being used at the time — and, at least under Oneiric, evolution –force-shutdown tends to be a bit strange, so you might need to manually kill processes such as the evolution-alarm-notifier before starting this process.

Upstream, Downstream and … What is it exactly?

Talking with Peter the other evening about kernel development teams (if you’ve been following along here throughout October, you’ll see that’s been the bulk of my month.) — we wondered:

“What is it called when you’re doing your [kernel] development outside of any sanctioned tree, but other developers with the same vein/idea are *also* taking ideas / code from your tree?”

In the Linux world, that’s not mainline — because that’s Linus’s domain — and, in the Google Android case, Android != Mainline, so it’s not Android / Google either.

Indeed, it’s not “upstream” either — as we’ve seen, Google often does not have changes that are done in external trees — there’s a reason there’s a Qualcomm tree exists that specific vendors pull chipset changes from, for example.

It’s not “downstream” in that same vein — as individual products commonly have either differing hardware, or differing versions of the same code on a device-by-device basis.

So, the two of us got onto something else — and I suddenly thought:

Sidestream

and Peter & I mulled it over for a few sentences and thought, yes, that’s more like it — after all, Sidestream infers:

Development done in parallel with versions upstream (Android Versions, in this case), but not included in upstream, not cherry-picked by upstream.

but also

Development used by downstream (Mobile Vendors, in this case) to provide updates and fixes to individual products — but changes to those files (by the vendors) are not necessarily sent back to either upstream or sidestream repositories.

Taking the Qualcomm example, changes are taken from there and cherry-picked into the ARM “mainline”, but new developments are used and tested there for a suitable amount of time before this happens.

Depending on when and if new Android kernel releases are frozen, the “upstream” code may not include these changes (for obvious reasons).

Vendors who require fixes for the frozen drivers in their “upstream” code, can then cherry-pick or take verbatim changes from the Qualcomm “sidestream” tree when required.

Thoughts? Could it catch on as a new buzzword for external kernel development? ;)

The choice of a fix?

As any Open Source enthusiast knows, our ecosystem is built using layers — there’s the kernel, the platform then the application, each of these serve clearly different purposes and usually, parts at the bottom (the kernel) expose required parts of themselves to things further up the stack.

This, of course provides different levels of tuning and optimisation — kernels have the ability to use /proc or sysfs to allow userspace tuning — the GNOME platform has things like dconf, gconf2 and gsettings to allow programs like gnome-tweak-tool to function for “power” users, as well as the standard control panel for “normal” users — and of course, individual programs have the ability to customise settings via use of the Edit / Preferences menu.

TCP/IP, as part of this ecosystem — is no exception. Of course, there are numerous examples of how to configure the TCP/IP stack, from academia, research departments, distributions, systems integrators and individuals. Most, if not all of these pages discuss using the sysctl program to control the /proc infrastructure in order to make changes to the TCP/IP stack — this is the way it should be.

All of this, is a long-winded lead-in to a seemingly innocuous issue I discovered while working with one of our teams on some mobile kernel analysis work recently — and following my last post here, made me wonder:

Why?

When looking at an n x k x t space when comparing kernels, one-line changes which occur in a single tree usually stand out like a “deer in the headlights” and this one typified the sort of question we all eventually had in the end.

Consider:

--- a/net/ipv4/tcp_output.c      2011-10-04 00:00:00.000000000 +0000
+++ b/net/ipv4/tcp_output.c      2011-03-25 00:00:00.000000000 +0000
@@ -243,6 +243,8 @@ void tcp_select_initial_window(int __spa
else if (*rcv_wnd > init_cwnd * mss)
*rcv_wnd = init_cwnd * mss;
}
+        /* Lock the initial TCP window size to 64K*/
+        *rcv_wnd = 64240;

/* Set the clamp no higher than max representable value */
(*window_clamp) = min(65535U << (*rcv_wscale), *window_clamp);

Once again, Why?

Especially when:

/sbin/sysctl -w net.ipv4.tcp_rmem = 64240 64240 [MAX]

and:

/sbin/sysctl -w net.ipv4.tcp_wmem = 64240 64240 [MAX]

(Where tcp_{r/w}mem takes the minimum, default and maximum values respectively.)

From userspace (for example, in the init.rc file) or from patching the Android sysctl.conf file — would have done the same thing the code above does but would have allowed tuning by product teams if required.

A good quote I saw during the course of my investigations into the Why question, was:

There is an argument for not applying any optimisation to the TCP connection between a webserver and a mobile phone. The features of a TCP connection can only be negotiated when the initial connection is established. If you move from an area with 3G reception to an area only offering 2.5G, or vice versa, the optimisations you may have done for the 3G connection may cause terrible performance on the 2.5G connection, due to the differences in the network characteristics. Assuming that a connection will always be on the same type of network technology means that you could fall into the pitfall of premature optimisation.

Could such a fix have been done because the production team once again had an internal testing issue that was resolved by doing such a thing, was it a case of it being the easiest way of doing interoperation with difficult client operating systems such as Windows XP, or was it done for another reason?

Indeed, it may have been done because often the kernel team in these situations is a completely different entity to those creating the platform — and asking the platform team to ‘tweak‘ these settings may have been more difficult than making a one-liner fix in the kernel.

Also, from viewing this one-line change, we do not know:

  • If Selective Acknowledgment (SACK) was enabled as part of this vendors platform code? (Reading of most GPRS optimisation guides available on the web, including RFC 3481 suggest it should be).
  • If TCP/IP Window Scaling (RFC 1323) was switched on and supported by default.
  • If TCP ECN (RFC 3168) was switched on and supported by default.
  • If the Cell Towers (that actually do the grunt of the work and are worldwide) as well as the intermediary networks that these particular devices exist on (which indeed, the vendor has no control over at all) have TCP/IP Header Compression (RFC 1144) turned OFF.

as well as a number of other things about this device and it’s networking functionality.

The wrong part of the stack to “fix” this type of issue though, for sure — and we can tell that, not just because we’ve (that’s the Operational Dynamics “we” here) got experience with the way this type of thing should be fixed, but because from our n x k x t space equation from before, no other team (even within the same organisation) chose to fix their particular device the same way.

The more I look at it, the more I think the use of the Think-aloud Protocol in “external” kernel development, would be an interesting thing to investigate.

A tale of two mobile device TTYs.

Recently, i’ve been tinkering with mobile phone code — specifically Android-based mobile code, it’s had a dual purpose, but it’s also had an enlightening effect on exactly why homebrew mobile modding (such as the very well constructed Cyanogen mod) communities actually exist.

One of the alterations I ran across while attempting to unbrick a friends HTC-based mobile phone (who, is not a technical person, but their warranty had expired and a failed ‘Software Upgrade’ on a Win32 based workstation, which had caused their phone to cease being a phone) was very interesting.

Consider the following code from HTC’s own htc_headset_mgr.c file (~ line 775 depending on your Android Kernel Revision):

static DEVICE_ACCESSORY_ATTR(tty, 0666, tty_flag_show, tty_flag_store);

htc_headset_mgr.c, for want of a better description, is the manager code for such devices as a wired headset, the code it controls (specifically, htc_headset.c) does not set permissions for the sysfs files or device nodes it creates — so presumably, that’s what this code does.

It should be noted the case of world-writable sysfs in device code is hardly new, nor is HTC the only vendor with these types of issues and there’s ongoing attempts to try and fix this from members of the Openwall project and others, some more successfully than others — but with the gambit of drivers, some using DAC, some using capabilities and some using neither — that’s hardly surprising.

This driver is one that uses neither, presumably because it was written without consideration for ever appearing outside the HTC tree (ie. upstream) or because the developers didn’t consider capabilities being worth their time to implement.

Cyanogen, fix this with:

static DEVICE_ACCESSORY_ATTR(tty, 0644, tty_flag_show, tty_flag_store);

Which fixes the obvious incorrectness, but the question remains:

Why?

  • Why would the headset require rw+ access for the allocated TTY (and presumably, any other driver this management module pertains to) in the first place.
  • Why does a development team, potentially shipping millions of devices featuring this device worldwide *not* implement _basic_ capabilities handling, or, failing that — at least some form of DAC in their code.
  • Why does a homebrew modding community take the time to find and “fix” this.
  • Why does the upstream production team not apply these changes, after they’ve been found in the wild by another community.

It may have been an oversight, it may have been a hack to get a testcase to pass internally that was never corrected, it may have just been a production team’s time-crunch deadline (every development team has those, after all) or it could be a lack of suitable training and information?

Still, it did make me wonder exactly which other drivers are being shipped with consumer electronics that have the same issues — and, if all this stuff is being done outside sanctioned upstream trees — the kind of issues those of us looking in from the outside are just “not privileged enough” to see.

Headers, and Lions, and Tigers, and Explorers (oh my.)

It seems a lot of my time of late is spent debugging, testing and fixing web technologies — mostly from a security perspective, or a performance one — but, occasionally, it’s more involved helping our people fix issues at runtime in the wild.

So, when a Redmine installation starts looking miserable from a site that looks bad in Internet Explorer — and we debug it to the point we can suggestively point out it’s the “Compatibility Mode” of IE, how do we fix this?

… and moreover, how do we fix this! (without the client needing to edit code to add the oft-used X-UA-Compatible meta string everywhere, in other words.)

It’s not that it’s hard, in reality — it just requires a little thinking from left field ;)

Sites like StackOverflow have a few examples on how to do this, but i’d like to suggest a better one which uses the setenvif and headers modules and a neat little tweak to make things ‘just work(tm)’ on earlier browsers and is both .htaccess and vhost compatible.

<IfModule mod_setenvif.c>
<IfModule mod_headers.c>
BrowserMatch \bMSIE\s[89] good-versions
BrowserMatch \bMSIE\s[67] bad-versions
Header set X-UA-Compatible "IE=IE9,IE=8" env=good-versions
Header set X-UA-Compatible "IE=EmulateIE7,chrome=1" env=bad-versions
</IfModule>
</IfModule>

Which says:

  • The BrowserMatch lines match browser user-agent lines that use IE9 or IE8, followed by IE7 and IE6 respectively and applies the environment variable good/bad-versions to them respectively.
  • The X-UA-Compatible header is set, then applied to those variables, with a degrading-based version numbering (ie. you apply from the highest supported version of IE to the lowest in that order.
  • Finally, the bad-versions line also applies Google Chrome Frame to the end of it’s line, so browsers like IE6 are asked if they would like to use that, before your code resorts to browser-based CSS hacks and other IE-related workarounds.

The list of X-UA-Compatible tags are listed here.

If you use support for IE 10 / Windows 8 Development Editions, feel free to add something like:

BrowserMatch \bMSIE10 new-versions
Header set X-UA-Compatible "IE=Edge" env=new-versions

To your Apache configuration / .htaccess file in the relevant spots — that should keep you covered for future versions of the code :)

One Hosed Apache Configuration, One Little Fix.

note: This is mainly here so the next time something happens to my Apache Installation :)

The other night, in the very, very, very early morning — one of the Apache servers I look after had a small issue with fsck and the /etc/apache2/modules-* directories became rather hosed.

After pulling up all the backups and applying the most recent web code, chmod‘ing the results and firing a service apache restart, I received:

access to client_directory failed, reason: require directives present and no Authoritative handler

The solution:

# a2enmod mod_authz_user

If, for some reason your installation doesn’t have the a2enmod software:

# ln -s /etc/apache2/sites-available/authz_user.load /etc/apache2/sites-enabled/authz_user.load

Works equally as well.

Returning to the Matrix

Making a return to blogging about goings-on for the first time since 2007 — one will see how effective this is, but given Andrew and Peter are already doing it, I thought it’d be a fine time to join the “in crowd“.

For anyone that doesn’t know, I am mainly interested / involved in Free & Open Software hackery, hitting hardware with hammers and have a passion for gambling, snowboarding and other foolhardy exploits with my own health. *

( [*] In Addition, It has been noted I may or may not actually be a real person and simply an enigma. )

Material on this site copyright © 2002-2012 Operational Dynamics Consulting, Pty Ltd unless otherwise noted. All rights reserved. Not for redistribution or attribution without permission in writing.

We make this service available to our staff in order to promote the discourse of ideas especially as relates to the development of Open Source worldwide. Blog entries on this site, however, are the musings of the authors as individuals and do not represent the views of Operational Dynamics. All times UTC.