Smarty Forum Index Smarty
WARNING: All discussion is moving to https://reddit.com/r/smarty, please go there! This forum will be closing soon.

Help w/ regex for smarty plugin

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Smarty Forum Index -> General
View previous topic :: View next topic  
Author Message
toma
Smarty Regular


Joined: 25 Apr 2003
Posts: 62

PostPosted: Tue May 06, 2003 8:26 pm    Post subject: Help w/ regex for smarty plugin Reply with quote

Hey all. I recently created this page,
http://smarty.incutio.com/?page=modifier.google_highlight
but I'm hoping someone has a better way to parse the search string than I use:

Currently I pull out all the double quoted strings before exploding the search term on spaces, but I think there's probably a way to do this using preg_split() along the lines of
/\".*\"|:space:/
but I know for sure that doesn't work and I'm not regexguru enough to figure it out.

Any help is appreciated!
Tom
Back to top
View user's profile Send private message Visit poster's website
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7368
Location: Lincoln Nebraska, USA

PostPosted: Tue May 06, 2003 8:48 pm    Post subject: Reply with quote

You're probably safest to leave this in two passes, first pull out the quoted blocks, then split by spaces.

(untested)

// get all double quoted strings out
preg_match_all('!".*?"!', $string, $_match);

// split by spaces
preg_split('!\s+!', $string, PREG_SPLIT_NO_EMPTY);


Monte
Back to top
View user's profile Send private message Visit poster's website
toma
Smarty Regular


Joined: 25 Apr 2003
Posts: 62

PostPosted: Tue May 06, 2003 9:39 pm    Post subject: Reply with quote

preg_match_all works well but doesn't leave me with an easily parseable string for the non-quoted terms.
preg_split('!".*?"!', $search);

This is what I have now:

// get all double quoted strings then terms
preg_match_all('!".*?"!', $search, $_match);
$terms = explode(' ', implode('', preg_split('!".*?"!', $search)));
if (sizeof($_match[0])) {
$terms = array_merge($_match[0], $terms);
}


But it can still contain empty elements.
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Tue May 06, 2003 10:16 pm    Post subject: Reply with quote

mohrt's regex does not handle escaped quotes, so careful!

Quote:
preg_match_all works well but doesn't leave me with an easily parseable string for the non-quoted terms.


Code:
// pre-tokenize strings
$src_c   = preg_match_all( '/"(.*)"/', $src, $token_str );
$src   = preg_replace( '/".*"/', ' STRING! ', $src );
// split on whitespace
$src   = preg_split ( '/\s+/', $src);


$src is now an array (split on whitespace) where each element is either non-string data or the token "STRING!". The actual string contents are stored in the array $token_str. $src_c is the count of strings collected.

As you process the array, whenever you encounter a STRING! token you simply do:

$string_contents = array_shift($token_str);

which pulls the appropriate string from string stack (Note: FIFO stack)

I used a similar technique here: http://www.phpinsider.com/smarty-forum/viewtopic.php?p=244&highlight=#244

Hope that helps.
Back to top
View user's profile Send private message
toma
Smarty Regular


Joined: 25 Apr 2003
Posts: 62

PostPosted: Tue May 06, 2003 11:00 pm    Post subject: Reply with quote

Thanks boots. That got me started. I changed how you did it (and added a ? to your regex) and got it down to two lines.

However, I just found a bug w/ the replacing for this. A search term of 'style' will make a hell of a mess because previous 'style's inserted by other terms are replaced. I really don't like the idea of tokenizing the text but I don't see any other solution.

Do you?

Thanks again,
Tom
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Wed May 07, 2003 12:06 am    Post subject: Reply with quote

To be honest, I haven't looked at your page at the wiki yet, so I'm not sure how your processing is occuring Smile For example, I don't see why you say that styles will replace previous styles.

There are other ways of doing this without using a string stack. I find it convenient to use a stack, but if you don't like it, I also sometimes use preg_replace_callback which allows for more sophisticated inline handling.

See: http://www.php.net/manual/en/function.preg-replace-callback.php

A few things about callback: only the matched array is passed to the callback function--if you need to keep static data or use data from elsewhere, you will have to co-ordinate that prior to using preg_replace_callback. You can register an arbitrary function in any object by using the array($object, 'function') notation.

You can also use while (preg_match (...) ) to process the strings without creating an intervening stack. You can use a variation of this to allow you to abort processing mid-way through the scan if you come across a syntax error in the input.

Questions:
Quote:
I changed how you did it (and added a ? to your regex)

Okay, what did you do? Note that: .* and .*? and .+? are equivalent.

In the code you listed, do you really want to use match[0]?

EDIT:

One more thing--even after many years of programming, when I first started writing regex's, I found them a little intimidating. Don't give up and don't be afraid to experiment. Practice IS important! It also helps you learn something about how the regex parser works (at least the PCRE library in PHP). For example, I learned that a poorly formed regex may work extremely well with a syntactically CORRECT input, but takes ORDERS OF MAGNITUDE more time with an input stream that it can not match. CAVEAT EMPTOR! It only takes one badly designed regex to blow the performance of your page! I strongly recommend that you time your code (use microtime) and that you develop test input patterns (both those that you want to work and some that will fail) to make sure that things are in good order.


Last edited by boots on Wed May 07, 2003 12:16 am; edited 2 times in total
Back to top
View user's profile Send private message
toma
Smarty Regular


Joined: 25 Apr 2003
Posts: 62

PostPosted: Wed May 07, 2003 12:15 am    Post subject: Reply with quote

>>To be honest, I haven't looked at your page at the wiki yet, so I'm not sure how your processing is occuring For example, I d

I don't know what to say to this comment except, perhaps, check it out?

I've updated the wiki as I've refined the processing. The search string parsing is to a point I'm happy w/ it but my problem lies in the
$text = preg_replace("/($val)/i", $style . '$1' . '</b>', $text);
for each search term. Previously found search terms will have <b style=... in them and a search for 'style' later on will find that code.

I think I need a regex like
/[^<b style...]($val)[^</b>]/
(think of this notation as phoenetic regex Smile

Tom

Edit:
As for .*? .* etc, my final code is
preg_match_all( '/"(.*?)"/', $search, $_quotes);
$_terms = array_merge((array)$_quotes[1], explode(' ', preg_replace( '/".*?"/', ' ', $search )));

Using just '/".*"/' resulted in missing multiple quote-enclosed search terms. "blah and" stuff "dna halb"
was found as
blah and" stuff "dna halb
.*? fixed it.


Last edited by toma on Wed May 07, 2003 12:28 am; edited 2 times in total
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Wed May 07, 2003 12:18 am    Post subject: Reply with quote

Quote:
I don't know what to say to this comment except, perhaps, check it out?


I was hoping you'd post the relevant section of your code, but I guess I will have to look at it if I am to give you any more help on this Smile
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7368
Location: Lincoln Nebraska, USA

PostPosted: Wed May 07, 2003 1:42 am    Post subject: Reply with quote

Quote:
Okay, what did you do? Note that: .* and .*? and .+? are equivalent.


AFAIK, these are not equivalent. .* matches zero or more, and .+ matches one or more. the ? forces ONE match, meaning it takes away greediness of the match.

example, if you have the string:

" foo " bar " blah "

".*" will match the entire string, whereas

".*?" will match only " foo "
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Wed May 07, 2003 2:04 am    Post subject: Reply with quote

mohrt wrote:
Quote:
Note that: .* and .*? and .+? are equivalent.


AFAIK, these are not equivalent. .* matches zero or more, and .+ matches one or more. the ? forces ONE match, meaning it takes away greediness of the match.


You're right, they are not equivalent--but it is even more subtle than you suggest -- and depending on how the greedy modifier is set, this all changes.

These all produce different results:
Code:
echo preg_match_all('!.*!', 'FOO BAR', $a); print_r($a); echo '<br>';
echo preg_match_all('!.*?!', 'FOO BAR', $a); print_r($a); echo '<br>';
echo preg_match_all('!.+!', 'FOO BAR', $a); print_r($a); echo '<br>';
echo preg_match_all('!.+?!', 'FOO BAR', $a); print_r($a); echo '<br>';


Most importantly, ? alone DOES NOT force one match. Alone, it forces AT MOST one match!

Quote:
? is equivalent to {0,1}

It is possible to construct infinite loops by following a subpattern that can match no characters with a quantifier that has no upper limit, for example:

(a?)*

From: http://www.php.net/manual/en/pcre.pattern.syntax.php


If you write something as described in the above quote, you will end-up with script time-outs!

HOWEVER, Monte was refering to +? which DOES force ONE match. I just wanted that to be clear for everyone.

I should be more careful before I post! Thanks to Monte for pointing out the errors.
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7368
Location: Lincoln Nebraska, USA

PostPosted: Wed May 07, 2003 5:33 am    Post subject: Reply with quote

Quote:
? alone DOES NOT force one match. Alone, it forces AT MOST one match!


Right, it allows the wildcard to match only one value under the condition that it matches anything at all. Then you get into negative look-behinds to be sure that quotes aren't escaped, etc. It's crazy stuff, I've learned more about regex than I ever wanted to when I rewrote the parser for version 2.4 Smile The O'Reilly "Mastering Regular Expressions" is a must-have, btw.

Monte
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Wed May 07, 2003 5:46 am    Post subject: Reply with quote

@mohrt: Thanks for the reference! I currently only have a (ragged) printed copy of the material from the PHP manual and though there is a lot in there, there is not a lot of practical information to springboard from.

For those interested, the O'Reilly page for the book is at http://www.oreilly.com/catalog/regex/chapter/index.html and has a link to a sample chapter Smile Looks good!!

ps. good show on version 2.4!
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7368
Location: Lincoln Nebraska, USA

PostPosted: Wed May 07, 2003 2:46 pm    Post subject: Reply with quote

btw, that O'Reilly book talks about lookahead and negative lookahead, but says negative lookbehind is not supported. However, Perl 5+ supports negative lookbehind under the condition that it is static.

example:

(?<!ABC)D

will match D only if preceeded by ABC. The ABC part must be a static value, you can't look for a regex.

In PHP if you want to match a double quote not preceeded by a backslash:

preg_match('/(?<!\\\\)"/', $foo);

You must be careful to escape the escape Wink

Monte
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Fri May 09, 2003 8:23 pm    Post subject: Reply with quote

I just picked up the book. Swimmingly good if you are into that sort of stuff!!
Back to top
View user's profile Send private message
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Smarty Forum Index -> General All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Protected by Anti-Spam ACP