The Linux Page

PCRE or PHP bug with .* and /s combinaison

See attachment at the bottom for sample code that fails in different versions of PHP 5.2.x

Today a customer told me that his pages disappeared on his Drupal site.

I looked into it and after a few hours determined that a module was the cause. Then I looked in the module and the only part I could see that could possibly be wrong was the preg_replace_callback() call.

So I got the input data from one of my customer pages and run the PCRE against it in an interactive version of PHP. That gave me the same result: nothing.

The expression started with (?:<p.*?>)?\[ and it used the /s modifier (i.e. make the dot (.) accept all characters including newline characters.

The expression works fine with certain files, but with these few pages my customer created, it would fail each time. Weird!

The complete regular expression included (.+?) in the middle, I replaced it with ((?:.|\n)+?) and removed the /s at the end of the expression.

The following is the full expression that fails:

preg_replace_callback('%(?:<p.*?>)?\[collapse( collapsed)?(?: title=([^]]*))?\](.+?)(?:<p.*?>)?\[/collapse](?:<\/p\s*>)?%smx', 'func', $s), "\n";

Notice the %smx at the end and the P tag at the start.

Several things I tried:

  • Removing the P tag at the start makes it work properly
  • Removing everthing after the P tag makes it work just fine
  • Removing everything except the \[ after the P tag makes it fail the same as with the full expression
  • Parsing the HTML table by table works just fine! (the full HTML string makes it fail, I do not know the threshold)

The code is a modified version of a Drupal module I posted on that module issue queue: Make collapsible_text play nice with WYSIWYG editors.

If you have any questions, let me know. I'm also posting a bug on the PHP website.

Update Jul 16, 2009. Okay! I see what happens (from my PHP post, I got a hint, using the preg_last_error() call! What a concept...) The bigger the source, the more space you need to run the regular expression. In this case, the cheer size makes the regular expression barf! So this is why trying the code on a smaller sub-set would work perfectly.

Now this may also explain the crash I'm getting with my server. A slightly different version of PHP and a long page too, but with many collapse tags in there.

AttachmentSize
pcre-bug.php_.gz29.51 KB