Smarty Forum Index Smarty
WARNING: All discussion is moving to https://reddit.com/r/smarty, please go there! This forum will be closing soon.

Regex bug in trimwhitespace

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Smarty Forum Index -> Bugs
View previous topic :: View next topic  
Author Message
lcoffin
Smarty Rookie


Joined: 05 Mar 2008
Posts: 5

PostPosted: Wed Mar 05, 2008 2:01 am    Post subject: Regex bug in trimwhitespace Reply with quote

There's a bug in the trimwhitespace pattern matching that only appears to cause a problem when the pattern matches a large block of text. And it may also only appear when PHP is compiled with newer versions of the PCRE library. I'm pretty sure it is invalid regular expression syntax in the trimwhitespace code so I think it qualifies as a Smarty bug, and not a PCRE bug.

Details of the error:

test_template.htm
Code:
<textarea rows="30">{$data}</textarea>



guts of test.php
Code:
$smarty->load_filter('output', 'trimwhitespace');


$data = str_repeat('x', 100000);

$smarty->assign('data', $data);

$smarty->display('test_template.htm');



output
Code:
(blank -- zero bytes returned via Apache)


Environment

Code:
PHP: 5.2.0
PCRE: 6.7 04-Jul-2006




I traced the problem down to the line in trimwhitespace:

Code:
    $source = preg_replace("!<textarea[^>]+>.*?</textarea>!is",
                           '@@@SMARTY:TRIM:TEXTAREA@@@', $source);


Before this line $source is not empty, after this line it is.

If $data is much shorter (10,000), everything works fine. The cutoff for me is anything that makes the matched text longer than 100,059 bytes.

Running this on an older version of PHP (with an older version of the PCRE) works fine too (PHP v5.1.6 w/ PCRE 6.6 06-Feb-2006).

The problem is with the '.*?' in the center of the RE. I'm pretty sure this in invalid regex syntax because you are applying two modifiers in a row without sub-pattern grouping -- the '.' says match any character, the '*' says match zero or more occurrences of the '.', and the '?' .... well, if it was '.?' it would mean match zero or one occurrences of the '.'.

I think '.*?' is supposed to semantically mean '(.*)?' which would mean zero or one occurrences of the sub-pattern '.*'. But the '?' in this case is actually redundant since all it is really saying is 'match zero or one occurrence of zero or more characters'. It's really the same as simply saying 'match any occurrence of zero or more characters' -- or simply '.*'.

So the correct code is:

Code:
    $source = preg_replace("!<textarea[^>]+>.*</textarea>!is",
                           '@@@SMARTY:TRIM:TEXTAREA@@@', $source);



With this change, the above example PHP/template files run correctly and output the <textarea> with the long data inside.

There are a few other places this same regex is used. A complete patch follows.

---Lawrence

Code:

*** outputfilter.trimwhitespace.php.orig       Tue Mar  4 20:08:40 2008
--- outputfilter.trimwhitespace.php.good       Tue Mar  4 20:13:10 2008
***************
*** 28,48 ****
  function smarty_outputfilter_trimwhitespace($source, &$smarty)
  {
      // Pull out the script blocks
!     preg_match_all("!<script[^>]+>.*?</script>!is", $source, $match);
      $_script_blocks = $match[0];
!     $source = preg_replace("!<script[^>]+>.*?</script>!is",
                             '@@@SMARTY:TRIM:SCRIPT@@@', $source);
 
      // Pull out the pre blocks
!     preg_match_all("!<pre>.*?</pre>!is", $source, $match);
      $_pre_blocks = $match[0];
!     $source = preg_replace("!<pre>.*?</pre>!is",
                             '@@@SMARTY:TRIM:PRE@@@', $source);
 
      // Pull out the textarea blocks
!     preg_match_all("!<textarea[^>]+>.*?</textarea>!is", $source, $match);
      $_textarea_blocks = $match[0];
!     $source = preg_replace("!<textarea[^>]+>.*?</textarea>!is",
                             '@@@SMARTY:TRIM:TEXTAREA@@@', $source);
 
      // remove all leading spaces, tabs and carriage returns NOT
--- 28,48 ----
  function smarty_outputfilter_trimwhitespace($source, &$smarty)
  {
      // Pull out the script blocks
!     preg_match_all("!<script[^>]+>.*</script>!is", $source, $match);
      $_script_blocks = $match[0];
!     $source = preg_replace("!<script[^>]+>.*</script>!is",
                             '@@@SMARTY:TRIM:SCRIPT@@@', $source);
 
      // Pull out the pre blocks
!     preg_match_all("!<pre>.*</pre>!is", $source, $match);
      $_pre_blocks = $match[0];
!     $source = preg_replace("!<pre>.*</pre>!is",
                             '@@@SMARTY:TRIM:PRE@@@', $source);
 
      // Pull out the textarea blocks
!     preg_match_all("!<textarea[^>]+>.*</textarea>!is", $source, $match);
      $_textarea_blocks = $match[0];
!     $source = preg_replace("!<textarea[^>]+>.*</textarea>!is",
                             '@@@SMARTY:TRIM:TEXTAREA@@@', $source);
 
      // remove all leading spaces, tabs and carriage returns NOT
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7368
Location: Lincoln Nebraska, USA

PostPosted: Wed Mar 05, 2008 3:09 am    Post subject: Reply with quote

fixed in SVN, thanks!
Back to top
View user's profile Send private message Visit poster's website
messju
Administrator


Joined: 16 Apr 2003
Posts: 3336
Location: Oldenburg, Germany

PostPosted: Wed Mar 05, 2008 8:43 am    Post subject: Reply with quote

wtf? .*? is not two modifiers.
It's an .* with greediness inverted (makes the * non-greedy in this case).

*NOW* trimwhitespace is broken. For example "<pre>.*</pre>" will now match the two outermost <pre>-tags and not a set of <pre>-tag-pairs.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Celeb
Administrator


Joined: 17 Apr 2007
Posts: 1025
Location: Vienna

PostPosted: Wed Mar 05, 2008 9:27 am    Post subject: Reply with quote

messju is right. .*? is nothing but a non-greedy .*
Non-greediness is quite important here though.

However, there's still the problem of obviously preg_replace reaching its limits. In my case the text matched by the .*? part of the pattern may not exceed 99.997 characters. If it does preg_replace returns an empty string without throwing any error message (error_reporting = E_ALL, nothing in the logs).
_________________
Darn computers always do what I tell them to instead of what I want them to do.
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7368
Location: Lincoln Nebraska, USA

PostPosted: Wed Mar 05, 2008 1:47 pm    Post subject: Reply with quote

This change was reverted in SVN.
Back to top
View user's profile Send private message Visit poster's website
lcoffin
Smarty Rookie


Joined: 05 Mar 2008
Posts: 5

PostPosted: Fri Mar 21, 2008 2:18 pm    Post subject: Reply with quote

Ooops! My apologies! Forgot about the non-greedy modifiers... Embarassed

Another simple, although not very elegant solution is to copy the original $source at the start of trimwhitespace, and if $source is empty at the end -- indicating a failed preg_replace -- simply return the unmodified copy. (Or just disabling trimwhitespace for pages you think *might* go over the limit.)

Any other thoughts?

---Lawrence
Back to top
View user's profile Send private message
Lemon Juice
Smarty Pro


Joined: 24 May 2006
Posts: 109

PostPosted: Tue Apr 01, 2008 5:04 pm    Post subject: Reply with quote

There is a default limit imposed on preg functions since php 5.2.0, read this:
http://bugs.php.net/bug.php?id=42649

I have come across this limitation once and I didn't know what the problem was because the php docs don't mention this change.
Back to top
View user's profile Send private message
m0n5t3r
Smarty n00b


Joined: 19 Nov 2008
Posts: 1

PostPosted: Wed Nov 19, 2008 7:19 pm    Post subject: Reply with quote

I ran into this issue today and solved it by replacing regex calls with a simple function that uses stripos; works for me just fine (it may have problems with bad HTML, but then so would the regex approach):

Code:

Index: plugins/outputfilter.trimwhitespace.php
===================================================================
--- plugins/outputfilter.trimwhitespace.php   (revision 8847)
+++ plugins/outputfilter.trimwhitespace.php   (working copy)
@@ -25,25 +25,40 @@
  * @param string
  * @param Smarty
  */
+function smarty_outputfilter_trimwhitespace_findblocks($source, $tag, $repl) {
+    $repl_blocks = array();
+    $t_source = '';
+    $epos = 0;
+   
+    while(($spos = stripos($source, "<$tag", $epos)) !== false) {
+        $old_epos = $epos;
+        $epos = stripos($source, "</$tag>", $spos);
+        if($epos !== false) {
+            $epos = $epos + strlen($tag) + 3;
+            $repl_blocks[] = substr($source, $spos, $epos - $spos);
+            $t_source .= substr($source, $old_epos, $spos - $old_epos)
+                . $repl;
+        }
+    }
+
+    $t_source .= substr($source, $epos);
+
+    return array($repl_blocks, $t_source);
+}
+
 function smarty_outputfilter_trimwhitespace($source, &$smarty)
 {
     // Pull out the script blocks
-    preg_match_all("!<script[^>]+>.*?</script>!is", $source, $match);
-    $_script_blocks = $match[0];
-    $source = preg_replace("!<script[^>]+>.*?</script>!is",
-                           '@@@SMARTY:TRIM:SCRIPT@@@', $source);
+    list($_script_blocks, $source) = smarty_outputfilter_trimwhitespace_findblocks($source,
+        'script', '@@@SMARTY:TRIM:SCRIPT@@@');
 
     // Pull out the pre blocks
-    preg_match_all("!<pre>.*?</pre>!is", $source, $match);
-    $_pre_blocks = $match[0];
-    $source = preg_replace("!<pre>.*?</pre>!is",
-                           '@@@SMARTY:TRIM:PRE@@@', $source);
+    list($_pre_blocks, $source) = smarty_outputfilter_trimwhitespace_findblocks($source,
+        'pre', '@@@SMARTY:TRIM:PRE@@@');
 
     // Pull out the textarea blocks
-    preg_match_all("!<textarea[^>]+>.*?</textarea>!is", $source, $match);
-    $_textarea_blocks = $match[0];
-    $source = preg_replace("!<textarea[^>]+>.*?</textarea>!is",
-                           '@@@SMARTY:TRIM:TEXTAREA@@@', $source);
+    list($_textarea_blocks, $source) = smarty_outputfilter_trimwhitespace_findblocks($source,
+        'textarea', '@@@SMARTY:TRIM:TEXTAREA@@@');
 
     // remove all leading spaces, tabs and carriage returns NOT
     // preceeded by a php close tag.
Back to top
View user's profile Send private message
dabilou
Smarty n00b


Joined: 23 Jul 2011
Posts: 3

PostPosted: Sat Jul 23, 2011 9:49 am    Post subject: Reply with quote

Hi

I have same problem due to php 5.2 limitation and no way to change it.
What about disabling this plugin ? Is there some bad effect ?
Back to top
View user's profile Send private message
U.Tews
Administrator


Joined: 22 Nov 2006
Posts: 5068
Location: Hamburg / Germany

PostPosted: Sun Jul 24, 2011 3:32 am    Post subject: Reply with quote

Have you tried to increase the backtrack limit?


ini_set('pcre.backtrack_limit', 20000000);
Back to top
View user's profile Send private message
dabilou
Smarty n00b


Joined: 23 Jul 2011
Posts: 3

PostPosted: Sun Jul 24, 2011 8:59 am    Post subject: Reply with quote

I cannot change the php.ini due to web hosting service limitation (it is shared hosting).

For the moment I turned off this filter.
Does the only purpose of this plugin is to save bandewith ?
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7368
Location: Lincoln Nebraska, USA

PostPosted: Sun Jul 24, 2011 2:00 pm    Post subject: Reply with quote

dabilou wrote:
I cannot change the php.ini due to web hosting service limitation (it is shared hosting).

For the moment I turned off this filter.
Does the only purpose of this plugin is to save bandewith ?


It is to save bandwidth, pretty much. You can set the limit with set_ini(), you don't have to edit the php.ini file itself.
Back to top
View user's profile Send private message Visit poster's website
dabilou
Smarty n00b


Joined: 23 Jul 2011
Posts: 3

PostPosted: Sun Jul 24, 2011 2:23 pm    Post subject: Reply with quote

Its ok, thanks a lot !
Back to top
View user's profile Send private message
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Smarty Forum Index -> Bugs All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Protected by Anti-Spam ACP