Broken XML Declarations, WordPress 101.

“This will teach me to help a friend in need” — A Crawler Sitemap shouldn’t be this hard.

I thought, as I stared at the broken XML warnings in my browser window — something which, in nearly 100 WordPress installations I look after, i’d not seen more than once or twice in recent memory — and certainly not on a site this simple.

A WordPress SEO Installation with Broken XML Sitemap In Chrome

What a crawler sitemap in Google Chrome looks like, when you’re not paying attention.

in Firefox, it looked like:

A WordPress SEO Installation with Broken XML Sitemap In Mozilla Firefox

Or alternatively, like this ….

Thirty seconds earlier, the phrase “I did everything you said and now you have broken it.” came out of said friends mouth, as I pulled up W3′s validator page on the subject.

Things started off simple enough, helping a small website get more traffic by introducing them to SEO using a combination of a freshly installed WordPress installation & a themed version of the existing layout of the page.

We’d gone through all the settings, i’d explained when each one had done, including (but not limited to) how “focus keywords” are NOT a replacement for meta keywords and how the Linkdex analysis included in the Yoast SEO Plugin doesn’t treat them as the same thing at all, but uses the single word to calculate the scoring ratio of a page.

(Something, it appears, many, many people actually get wrong when using this plugin.)

We’d then gone through the .htaccess file to stop hotlinking and made a humans.txt file to explain to crawlers who my friend was and why they were busy selling stuffed soft toys off their own website, rather than Etsy, or eBay.

WooCommerce was working, the content was up, the meta descriptions were written and now, well, that.

Cue the “tear it down and debug it sequence”.

Plugins removed, reset, re-added. Nope — still an issue. Theme removed, tested with the default themes. Nope, still an issue.

Pulled up the designer’s copy of the WordPress code (who thankfully, had sent me an entire wordpress/ directory, rather than just the theme they had worked upon) and ran my favourite diff line between it and a vanilla 3.5 install:

$ diff -urpaN -U 1 -EBb vanilla/ modified/ > wtf-is-going-on.diff

… and read the diff, wishing i’d added a -X to the line above to remove all the fantastico junk that web providers like to leave in their clients directories.

There, at the bottom of the wp-config.php file, sat:


(note: for people not used to UNIX operating environments, a Control-M is the way most editors display the “Carriage Return” from Windows environments, it could otherwise been expressed as 0x0D or simply “a blank line.)

Cue “curse words(tm)”

Removing that and uploaded the altered copy, the default theme works again!

Turn all the plugins back on, all good.

Changed the theme, BROKEN.

(Enter the “tear it down and check the theme sequence” accompanied by Don Davis & Juno Reactor’s Matrix score, which makes lovely “annoyed hack” music, if you ever find yourself hacking on code at 4am that a friend of many years accuses you of breaking then leaves the conversation for bed. :))

Now, from many years of experience with PHP code damage, I always check the “user-modifible parts” first, which in WordPress’ case, is the functions.php file in a theme.

This one, had been built from an older theme (K2, if anyone remembers back that far), because it was “very simple” according to the text file the developer had written that accompanied the wordpress installation they’d forwarded along:

1. <?php
2. if ( function_exists('register_nav_menus') )
3. register_nav_menus(array(
4. 'primary'=>__('Left Hand Side Navigation Menu'),
5. 'secondary'=>__('Footer Menu'),
6. ));
7. ?>
9. <?php
10. if ( function_exists('show_admin_bar') )
11. add_filter('show_admin_bar', '__return_false');
12. ?>

Twelve lines.

Twelve lines long and it looks fine, when it reality, it isn’t — you see, the W3 “WPBlankLine” Documentation, under Solution, does state:

Check your theme’s functions.php file for blank lines outside of <? and ?> bracketed sections.

So, the solution therefore is to…

Remove line 8.

(or, if you want to be really neat and tidy, ammend the code to be in one PHP block, by altering it to read):

1. <?php
2. if ( function_exists('register_nav_menus') )
3. register_nav_menus(array(
4. 'primary'=>__('Left Hand Side Navigation Menu'),
5. 'secondary'=>__('Footer Menu'),
6. ));
8. if ( function_exists('show_admin_bar') )
9. add_filter('show_admin_bar', '__return_false');
10. ?>

Either one will work, but the tidier way does mean less grief later — and that can only be a good thing, especially as the W3 “WPBlankLine” Documentation, under Explanation, also states:

Unfortunately, with WordPress it seems all too easy for a plugin, a theme, or for your configuration file to contain a blank line. Further compounding this problem, some — but not all — feed readers compensate for this common error, allowing the error to go undetected for quite a while.

Fire up the sitemap now from the admin … and up comes:

A WordPress SEO Installation with Working XML Sitemap

What it really *should* have looked like in the first place.

So, as it turns out, the Yoast SEO plugin is more of a stickler for correctness than most of the other plugins that generate sitemaps for crawlers than i’ve seen in the last few years — if you have seen this Yoast SEO bug or you’re seeing errors in Google Chrome like:

XML Declaration Only Allowed at the Start of the document

or in Mozilla Firefox like:

XML Parsing Error: XML or Text Declaration not at start of entity

Or worse, you’ve got a WordPress installation generating a sitemap and you’re wondering why crawlers haven’t seen it, hopefully this article can help you find out why.

PSA: mod_security v2 v. the Flash Uploader in WordPress

Recently, i’ve seen a lot of:

<IfModule mod_security.c>
<Files async-upload.php>
 SecFilterEngine Off
 SecFilterScanPOST Off

As the solution to the “My Flash Upload option no longer works with WordPress” — of course, if you’re using version 2 of mod-security — the correct way to completely disable mod-security for the flash uploader, as per the Migration Matrix page, this should be:

<IfModule mod_security2.c>
<Files async-upload.php>
 SecRuleEngine Off
 SecRequestBodyAccess Off
 SecResponseBodyAccess Off

It would appear there’s not a lot of mod-security v2 information as it relates to WordPress — and given issues with the handling of the async-upload.php have recently started appearing on the interschnitzel, I thought i’d put this here in case it is of assistance to anyone else.

HTML Formatting, Blockquotes, Paragraphs & You

Since i’ve been blogging here, one thing has continually frustrated me about the WordPress interface — the fact blockquotes and code tags in the editor will always, automatically put a br tag in, making formatting code, HTML fragments and other configuration examples rather annoying to post.

So, I went looking for a solution — as most of my posts here will have code examples :)

As it turns out, the WordPress Codex for wpautop() already has the ability to turn off the function that does this as part of it’s design — and because I didn’t want to get rid of the function altogether, it was easier to craft my own.

So, in functions.php — it’s a case of:

  1. Removing the existing filter.
  2. Adding our own filter that returns ‘false‘ for the $br portion.
  3. Adding our new filter.

Which looks like:

remove_filter( 'the_content', 'wpautop' );
remove_filter( 'the_excerpt', 'wpautop' );

function wpautop_fixed($str) {
 return wpautop($str, false);

add_filter( 'the_content', 'wpautop_fixed' );
add_filter( 'the_excerpt', 'wpautop_fixed' );

Problem solved, code lines don’t break anymore — and the amount of extra HTML I have to add to get the standard editor (or indeed, the uber-cool Markdown on Save Improved plugin we use here) is minimised.

Headers, and Lions, and Tigers, and Explorers (oh my.)

It seems a lot of my time of late is spent debugging, testing and fixing web technologies — mostly from a security perspective, or a performance one — but, occasionally, it’s more involved helping our people fix issues at runtime in the wild.

So, when a Redmine installation starts looking miserable from a site that looks bad in Internet Explorer — and we debug it to the point we can suggestively point out it’s the “Compatibility Mode” of IE, how do we fix this?

… and moreover, how do we fix this! (without the client needing to edit code to add the oft-used X-UA-Compatible meta string everywhere, in other words.)

It’s not that it’s hard, in reality — it just requires a little thinking from left field ;)

Sites like StackOverflow have a few examples on how to do this, but i’d like to suggest a better one which uses the setenvif and headers modules and a neat little tweak to make things ‘just work(tm)’ on earlier browsers and is both .htaccess and vhost compatible.

<IfModule mod_setenvif.c>
<IfModule mod_headers.c>
BrowserMatch \bMSIE\s[89] good-versions
BrowserMatch \bMSIE\s[67] bad-versions
Header set X-UA-Compatible "IE=IE9,IE=8" env=good-versions
Header set X-UA-Compatible "IE=EmulateIE7,chrome=1" env=bad-versions

Which says:

  • The BrowserMatch lines match browser user-agent lines that use IE9 or IE8, followed by IE7 and IE6 respectively and applies the environment variable good/bad-versions to them respectively.
  • The X-UA-Compatible header is set, then applied to those variables, with a degrading-based version numbering (ie. you apply from the highest supported version of IE to the lowest in that order.
  • Finally, the bad-versions line also applies Google Chrome Frame to the end of it’s line, so browsers like IE6 are asked if they would like to use that, before your code resorts to browser-based CSS hacks and other IE-related workarounds.

The list of X-UA-Compatible tags are listed here.

If you use support for IE 10 / Windows 8 Development Editions, feel free to add something like:

BrowserMatch \bMSIE10 new-versions
Header set X-UA-Compatible "IE=Edge" env=new-versions

To your Apache configuration / .htaccess file in the relevant spots — that should keep you covered for future versions of the code :)

One Hosed Apache Configuration, One Little Fix.

note: This is mainly here so the next time something happens to my Apache Installation :)

The other night, in the very, very, very early morning — one of the Apache servers I look after had a small issue with fsck and the /etc/apache2/modules-* directories became rather hosed.

After pulling up all the backups and applying the most recent web code, chmod‘ing the results and firing a service apache restart, I received:

access to client_directory failed, reason: require directives present and no Authoritative handler

The solution:

# a2enmod mod_authz_user

If, for some reason your installation doesn’t have the a2enmod software:

# ln -s /etc/apache2/sites-available/authz_user.load /etc/apache2/sites-enabled/authz_user.load

Works equally as well.