The Linux Page

Broken RSS feeds

Since December 2008, I have noticed that many RSS feeds include invalid ampersand characters. XML has a very specific way to handle the ampersand character, you have to write & and not just &. Although it is similar in HTML, in XML it is actually enforced.

This is neat, it ensures that your files are really valid.

Now, when you do enforce the syntax, you also very simply lose the capability of reading many RSS feeds even from quite prominent websites such as the National Geographic RSS feed (yes, they do not know how to handle the ampersand properly, at least they have been so far and we're in Jul 23, 2009.)

Why is that? you will ask... Well, the fact is that most of the best readers work properly. Those are Outlook, FireFox, Opera and a few others. What those readers do is interpret the lone ampersand characters as ampersand characters and don't bother generating an error.

The problem for me is that Drupal, and the aggregator author, does not want to fix those invalid ampersands (at least not in the core) and thus I cannot read my favorite RSS feed... Can't I? Oh! Sorry! I'm a programmer. So I can actually fix the feed myself and forget about Drupal Core author not willing to fix their program. This is so cool. I have the power!!!

But of course, some others may want to benefit from the patch I wrote. It is available on the Drupal website, there is a copy of the D6 and D7 patches.

First, give a try to this really bad feed:

And the funny thing is, FireFox and Thunderbird cannot read that really bad feed, but my code can. The main reason why is because of a double ampersand (as in C/C++ or PHP code such as: a&&b).

Once you tested my broken feed with your existing Drupal site, then use my patches to fix your code, finally try again and will see the content of the feed. Cool hey?!


aggregator-fix_ampersand-6.x.patch1.64 KB
aggregator-fix_ampersand-7.x.patch2.06 KB