Broken XML Declarations, WordPress 101.

“This will teach me to help a friend in need” — A Crawler Sitemap shouldn’t be this hard.

I thought, as I stared at the broken XML warnings in my browser window — something which, in nearly 100 WordPress installations I look after, i’d not seen more than once or twice in recent memory — and certainly not on a site this simple.

A WordPress SEO Installation with Broken XML Sitemap In Chrome

What a crawler sitemap in Google Chrome looks like, when you’re not paying attention.

in Firefox, it looked like:

A WordPress SEO Installation with Broken XML Sitemap In Mozilla Firefox

Or alternatively, like this ….

Thirty seconds earlier, the phrase “I did everything you said and now you have broken it.” came out of said friends mouth, as I pulled up W3′s validator page on the subject.

Things started off simple enough, helping a small website get more traffic by introducing them to SEO using a combination of a freshly installed WordPress installation & a themed version of the existing layout of the page.

We’d gone through all the settings, i’d explained when each one had done, including (but not limited to) how “focus keywords” are NOT a replacement for meta keywords and how the Linkdex analysis included in the Yoast SEO Plugin doesn’t treat them as the same thing at all, but uses the single word to calculate the scoring ratio of a page.

(Something, it appears, many, many people actually get wrong when using this plugin.)

We’d then gone through the .htaccess file to stop hotlinking and made a humans.txt file to explain to crawlers who my friend was and why they were busy selling stuffed soft toys off their own website, rather than Etsy, or eBay.

WooCommerce was working, the content was up, the meta descriptions were written and now, well, that.

Cue the “tear it down and debug it sequence”.

Plugins removed, reset, re-added. Nope — still an issue. Theme removed, tested with the default themes. Nope, still an issue.

Pulled up the designer’s copy of the WordPress code (who thankfully, had sent me an entire wordpress/ directory, rather than just the theme they had worked upon) and ran my favourite diff line between it and a vanilla 3.5 install:

$ diff -urpaN -U 1 -EBb vanilla/ modified/ > wtf-is-going-on.diff

… and read the diff, wishing i’d added a -X to the line above to remove all the fantastico junk that web providers like to leave in their clients directories.

There, at the bottom of the wp-config.php file, sat:

^M

(note: for people not used to UNIX operating environments, a Control-M is the way most editors display the “Carriage Return” from Windows environments, it could otherwise been expressed as 0x0D or simply “a blank line.)

Cue “curse words(tm)”

Removing that and uploaded the altered copy, the default theme works again!

Turn all the plugins back on, all good.

Changed the theme, BROKEN.

(Enter the “tear it down and check the theme sequence” accompanied by Don Davis & Juno Reactor’s Matrix score, which makes lovely “annoyed hack” music, if you ever find yourself hacking on code at 4am that a friend of many years accuses you of breaking then leaves the conversation for bed. :))

Now, from many years of experience with PHP code damage, I always check the “user-modifible parts” first, which in WordPress’ case, is the functions.php file in a theme.

This one, had been built from an older theme (K2, if anyone remembers back that far), because it was “very simple” according to the text file the developer had written that accompanied the wordpress installation they’d forwarded along:

1. <?php
2. if ( function_exists('register_nav_menus') )
3. register_nav_menus(array(
4. 'primary'=>__('Left Hand Side Navigation Menu'),
5. 'secondary'=>__('Footer Menu'),
6. ));
7. ?>
8. 
9. <?php
10. if ( function_exists('show_admin_bar') )
11. add_filter('show_admin_bar', '__return_false');
12. ?>

Twelve lines.

Twelve lines long and it looks fine, when it reality, it isn’t — you see, the W3 “WPBlankLine” Documentation, under Solution, does state:

Check your theme’s functions.php file for blank lines outside of <? and ?> bracketed sections.

So, the solution therefore is to…

Remove line 8.

(or, if you want to be really neat and tidy, ammend the code to be in one PHP block, by altering it to read):

1. <?php
2. if ( function_exists('register_nav_menus') )
3. register_nav_menus(array(
4. 'primary'=>__('Left Hand Side Navigation Menu'),
5. 'secondary'=>__('Footer Menu'),
6. ));
7.
8. if ( function_exists('show_admin_bar') )
9. add_filter('show_admin_bar', '__return_false');
10. ?>

Either one will work, but the tidier way does mean less grief later — and that can only be a good thing, especially as the W3 “WPBlankLine” Documentation, under Explanation, also states:

Unfortunately, with WordPress it seems all too easy for a plugin, a theme, or for your configuration file to contain a blank line. Further compounding this problem, some — but not all — feed readers compensate for this common error, allowing the error to go undetected for quite a while.

Fire up the sitemap now from the admin … and up comes:

A WordPress SEO Installation with Working XML Sitemap

What it really *should* have looked like in the first place.

So, as it turns out, the Yoast SEO plugin is more of a stickler for correctness than most of the other plugins that generate sitemaps for crawlers than i’ve seen in the last few years — if you have seen this Yoast SEO bug or you’re seeing errors in Google Chrome like:

XML Declaration Only Allowed at the Start of the document

or in Mozilla Firefox like:

XML Parsing Error: XML or Text Declaration not at start of entity

Or worse, you’ve got a WordPress installation generating a sitemap and you’re wondering why crawlers haven’t seen it, hopefully this article can help you find out why.