|
Smarty
WARNING: All discussion is moving to https://reddit.com/r/smarty, please go there! This forum will be closing soon. |
|
View previous topic :: View next topic |
Author |
Message |
toma Smarty Regular
Joined: 25 Apr 2003 Posts: 62
|
Posted: Tue May 06, 2003 8:26 pm Post subject: Help w/ regex for smarty plugin |
|
|
Hey all. I recently created this page,
http://smarty.incutio.com/?page=modifier.google_highlight
but I'm hoping someone has a better way to parse the search string than I use:
Currently I pull out all the double quoted strings before exploding the search term on spaces, but I think there's probably a way to do this using preg_split() along the lines of
/\".*\"|:space:/
but I know for sure that doesn't work and I'm not regexguru enough to figure it out.
Any help is appreciated!
Tom |
|
Back to top |
|
mohrt Administrator
Joined: 16 Apr 2003 Posts: 7368 Location: Lincoln Nebraska, USA
|
Posted: Tue May 06, 2003 8:48 pm Post subject: |
|
|
You're probably safest to leave this in two passes, first pull out the quoted blocks, then split by spaces.
(untested)
// get all double quoted strings out
preg_match_all('!".*?"!', $string, $_match);
// split by spaces
preg_split('!\s+!', $string, PREG_SPLIT_NO_EMPTY);
Monte |
|
Back to top |
|
toma Smarty Regular
Joined: 25 Apr 2003 Posts: 62
|
Posted: Tue May 06, 2003 9:39 pm Post subject: |
|
|
preg_match_all works well but doesn't leave me with an easily parseable string for the non-quoted terms.
preg_split('!".*?"!', $search);
This is what I have now:
// get all double quoted strings then terms
preg_match_all('!".*?"!', $search, $_match);
$terms = explode(' ', implode('', preg_split('!".*?"!', $search)));
if (sizeof($_match[0])) {
$terms = array_merge($_match[0], $terms);
}
But it can still contain empty elements. |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Tue May 06, 2003 10:16 pm Post subject: |
|
|
mohrt's regex does not handle escaped quotes, so careful!
Quote: | preg_match_all works well but doesn't leave me with an easily parseable string for the non-quoted terms. |
Code: | // pre-tokenize strings
$src_c = preg_match_all( '/"(.*)"/', $src, $token_str );
$src = preg_replace( '/".*"/', ' STRING! ', $src );
// split on whitespace
$src = preg_split ( '/\s+/', $src); |
$src is now an array (split on whitespace) where each element is either non-string data or the token "STRING!". The actual string contents are stored in the array $token_str. $src_c is the count of strings collected.
As you process the array, whenever you encounter a STRING! token you simply do:
$string_contents = array_shift($token_str);
which pulls the appropriate string from string stack (Note: FIFO stack)
I used a similar technique here: http://www.phpinsider.com/smarty-forum/viewtopic.php?p=244&highlight=#244
Hope that helps. |
|
Back to top |
|
toma Smarty Regular
Joined: 25 Apr 2003 Posts: 62
|
Posted: Tue May 06, 2003 11:00 pm Post subject: |
|
|
Thanks boots. That got me started. I changed how you did it (and added a ? to your regex) and got it down to two lines.
However, I just found a bug w/ the replacing for this. A search term of 'style' will make a hell of a mess because previous 'style's inserted by other terms are replaced. I really don't like the idea of tokenizing the text but I don't see any other solution.
Do you?
Thanks again,
Tom |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Wed May 07, 2003 12:06 am Post subject: |
|
|
To be honest, I haven't looked at your page at the wiki yet, so I'm not sure how your processing is occuring For example, I don't see why you say that styles will replace previous styles.
There are other ways of doing this without using a string stack. I find it convenient to use a stack, but if you don't like it, I also sometimes use preg_replace_callback which allows for more sophisticated inline handling.
See: http://www.php.net/manual/en/function.preg-replace-callback.php
A few things about callback: only the matched array is passed to the callback function--if you need to keep static data or use data from elsewhere, you will have to co-ordinate that prior to using preg_replace_callback. You can register an arbitrary function in any object by using the array($object, 'function') notation.
You can also use while (preg_match (...) ) to process the strings without creating an intervening stack. You can use a variation of this to allow you to abort processing mid-way through the scan if you come across a syntax error in the input.
Questions:
Quote: | I changed how you did it (and added a ? to your regex) |
Okay, what did you do? Note that: .* and .*? and .+? are equivalent.
In the code you listed, do you really want to use match[0]?
EDIT:
One more thing--even after many years of programming, when I first started writing regex's, I found them a little intimidating. Don't give up and don't be afraid to experiment. Practice IS important! It also helps you learn something about how the regex parser works (at least the PCRE library in PHP). For example, I learned that a poorly formed regex may work extremely well with a syntactically CORRECT input, but takes ORDERS OF MAGNITUDE more time with an input stream that it can not match. CAVEAT EMPTOR! It only takes one badly designed regex to blow the performance of your page! I strongly recommend that you time your code (use microtime) and that you develop test input patterns (both those that you want to work and some that will fail) to make sure that things are in good order.
Last edited by boots on Wed May 07, 2003 12:16 am; edited 2 times in total |
|
Back to top |
|
toma Smarty Regular
Joined: 25 Apr 2003 Posts: 62
|
Posted: Wed May 07, 2003 12:15 am Post subject: |
|
|
>>To be honest, I haven't looked at your page at the wiki yet, so I'm not sure how your processing is occuring For example, I d
I don't know what to say to this comment except, perhaps, check it out?
I've updated the wiki as I've refined the processing. The search string parsing is to a point I'm happy w/ it but my problem lies in the
$text = preg_replace("/($val)/i", $style . '$1' . '</b>', $text);
for each search term. Previously found search terms will have <b style=... in them and a search for 'style' later on will find that code.
I think I need a regex like
/[^<b style...]($val)[^</b>]/
(think of this notation as phoenetic regex
Tom
Edit:
As for .*? .* etc, my final code is
preg_match_all( '/"(.*?)"/', $search, $_quotes);
$_terms = array_merge((array)$_quotes[1], explode(' ', preg_replace( '/".*?"/', ' ', $search )));
Using just '/".*"/' resulted in missing multiple quote-enclosed search terms. "blah and" stuff "dna halb"
was found as
blah and" stuff "dna halb
.*? fixed it.
Last edited by toma on Wed May 07, 2003 12:28 am; edited 2 times in total |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Wed May 07, 2003 12:18 am Post subject: |
|
|
Quote: | I don't know what to say to this comment except, perhaps, check it out? |
I was hoping you'd post the relevant section of your code, but I guess I will have to look at it if I am to give you any more help on this |
|
Back to top |
|
mohrt Administrator
Joined: 16 Apr 2003 Posts: 7368 Location: Lincoln Nebraska, USA
|
Posted: Wed May 07, 2003 1:42 am Post subject: |
|
|
Quote: | Okay, what did you do? Note that: .* and .*? and .+? are equivalent. |
AFAIK, these are not equivalent. .* matches zero or more, and .+ matches one or more. the ? forces ONE match, meaning it takes away greediness of the match.
example, if you have the string:
" foo " bar " blah "
".*" will match the entire string, whereas
".*?" will match only " foo " |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Wed May 07, 2003 2:04 am Post subject: |
|
|
mohrt wrote: | Quote: | Note that: .* and .*? and .+? are equivalent. |
AFAIK, these are not equivalent. .* matches zero or more, and .+ matches one or more. the ? forces ONE match, meaning it takes away greediness of the match.
|
You're right, they are not equivalent--but it is even more subtle than you suggest -- and depending on how the greedy modifier is set, this all changes.
These all produce different results:
Code: | echo preg_match_all('!.*!', 'FOO BAR', $a); print_r($a); echo '<br>';
echo preg_match_all('!.*?!', 'FOO BAR', $a); print_r($a); echo '<br>';
echo preg_match_all('!.+!', 'FOO BAR', $a); print_r($a); echo '<br>';
echo preg_match_all('!.+?!', 'FOO BAR', $a); print_r($a); echo '<br>'; |
Most importantly, ? alone DOES NOT force one match. Alone, it forces AT MOST one match!
If you write something as described in the above quote, you will end-up with script time-outs!
HOWEVER, Monte was refering to +? which DOES force ONE match. I just wanted that to be clear for everyone.
I should be more careful before I post! Thanks to Monte for pointing out the errors. |
|
Back to top |
|
mohrt Administrator
Joined: 16 Apr 2003 Posts: 7368 Location: Lincoln Nebraska, USA
|
Posted: Wed May 07, 2003 5:33 am Post subject: |
|
|
Quote: | ? alone DOES NOT force one match. Alone, it forces AT MOST one match! |
Right, it allows the wildcard to match only one value under the condition that it matches anything at all. Then you get into negative look-behinds to be sure that quotes aren't escaped, etc. It's crazy stuff, I've learned more about regex than I ever wanted to when I rewrote the parser for version 2.4 The O'Reilly "Mastering Regular Expressions" is a must-have, btw.
Monte |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Wed May 07, 2003 5:46 am Post subject: |
|
|
@mohrt: Thanks for the reference! I currently only have a (ragged) printed copy of the material from the PHP manual and though there is a lot in there, there is not a lot of practical information to springboard from.
For those interested, the O'Reilly page for the book is at http://www.oreilly.com/catalog/regex/chapter/index.html and has a link to a sample chapter Looks good!!
ps. good show on version 2.4! |
|
Back to top |
|
mohrt Administrator
Joined: 16 Apr 2003 Posts: 7368 Location: Lincoln Nebraska, USA
|
Posted: Wed May 07, 2003 2:46 pm Post subject: |
|
|
btw, that O'Reilly book talks about lookahead and negative lookahead, but says negative lookbehind is not supported. However, Perl 5+ supports negative lookbehind under the condition that it is static.
example:
(?<!ABC)D
will match D only if preceeded by ABC. The ABC part must be a static value, you can't look for a regex.
In PHP if you want to match a double quote not preceeded by a backslash:
preg_match('/(?<!\\\\)"/', $foo);
You must be careful to escape the escape
Monte |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Fri May 09, 2003 8:23 pm Post subject: |
|
|
I just picked up the book. Swimmingly good if you are into that sort of stuff!! |
|
Back to top |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|