Smarty Forum Index Smarty
WARNING: All discussion is moving to https://reddit.com/r/smarty, please go there! This forum will be closing soon.

google style highlighting (ideal for search result pages)

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Smarty Forum Index -> Plugins
View previous topic :: View next topic  
Author Message
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Thu Nov 13, 2003 6:05 pm    Post subject: google style highlighting (ideal for search result pages) Reply with quote

I just tried today this output filter (basically word highlighting similar to google cache)

Don’t know who wrote but it seems like a nice addition to the smarty plugin resources. BTW, not that it matters, but the google colors are a bit off. Here is the correct set;

$colorArr = array('#ffff66','#A0FFFF','#99ff99','#ff9999','#ff66ff');

http://de.tikiwiki.org/dox-head/html/outputfilter_8highlight_8php-source.html

Code:

00001 <?php
00002 /*
00003  * Smarty plugin
00004  * -------------------------------------------------------------
00005  * File:     outputfilter.highlight.php
00006  * Type:     outputfilter
00007  * Name:     highlight
00008  * Version:  1.1
00009  * Date:     Sep 18, 2003
00010  * Version:  1.0
00011  * Date:     Aug 10, 2003
00012  * Purpose:  Adds Google-cache-like highlighting for terms in a
00013  *           template after its rendered. This can be used
00014  *           easily integrated with the wiki search functionality
00015  *           to provide highlighted search terms.
00016  * Install:  Drop into the plugin directory, call
00017  *           $smarty->load_filter('output','highlight');
00018  *           from application.
00019  * Author:   Greg Hinkle <ghinkl@users.sourceforge.net>
00020  *           patched by mose <mose@feu.org>
00021  * -------------------------------------------------------------
00022  */
00023  function smarty_outputfilter_highlight($source, &$smarty) {
00024         
00025     $highlight = $_REQUEST['highlight'];
00026     $words = $highlight;
00027     if (!isset($highlight)) {
00028                         return $source;
00029     }
00030
00031     // Pull out the script blocks
00032     preg_match_all("!<script[^>]+>.*?</script>!is", $source, $match);
00033     $_script_blocks = $match[0];
00034     $source = preg_replace("!<script[^>]+>.*?</script>!is",
00035     '@@@SMARTY:TRIM:SCRIPT@@@', $source);
00036
00037     // pull out all html tags
00038     preg_match_all("'<[\/\!]*?[^<>]*?>'si", $source, $match);
00039     $_tag_blocks = $match[0];
00040     $source = preg_replace("'<[\/\!]*?[^<>]*?>'si", '@@@SMARTY:TRIM:TAG@@@', $source);
00041
00042     // This array is used to choose colors for supplied highlight terms
00043     $colorArr = array('#ffff66','#ff9999','#A0FFFF','#ff66ff','#99ff99');
00044
00045     // Wrap all the highlight words with tags bolding them and changing
00046     // their background colors
00047     $wordArr = split(" ",$words);
00048     $i = 0;
00049     foreach($wordArr as $word) {
00050                         $source = preg_replace("'($word)'si", '<span style="color:black;background-color:'.$colorArr[$i].';">$1</span>', $source);
00051                         $i++;
00052     }
00053
00054     // replace script blocks
00055     foreach($_script_blocks as $curr_block) {
00056                         $source = preg_replace("!@@@SMARTY:TRIM:SCRIPT@@@!",$curr_block,$source,1);
00057     }
00058
00059     foreach($_tag_blocks as $curr_block) {
00060                         $source = preg_replace("!@@@SMARTY:TRIM:TAG@@@!",$curr_block,$source,1);
00061     }
00062
00063     return $source;
00064  }
00065 ?>


I do have another problem (challenge). The app I’m building highlights words in search results that are matched via mysql full text searching (natural language) So for example, mysql would match ‘habitación’ to ‘habitacion’, but the preg in the output filter obviously won’t.

I don’t pretend to cover all mysql matching patterns but I'll appreciate any pointers to find a palliative (accents matching, etc might be a good start)

I couldn’t find anything in mysql site about the natural language matching.

Ubaldo Huerta
http://www.loquo.com
http://foro.loquo.com
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Thu Nov 13, 2003 6:24 pm    Post subject: Reply with quote

ubaldo, nice plugin!

Perhaps this will help you.

Cheers!
Back to top
View user's profile Send private message
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Thu Nov 13, 2003 11:01 pm    Post subject: Reply with quote

Mr boots,

Well, I had to disable the output filter in my production site pending “investigation”.

The output filter intelligently avoids string replacing inside tags and javascript blocks but, unfortunatelly, it modifies html title tag (something that nobody wants).

I think it’s not practical to have word highlighting as an output filter, perhaps it should be a block function instead. Naturally, one could easily fix it to avoid the string replacement within the html title tag, but again, I think that the highlighting, in real world situations, should occur only under a specific section of the page, hence my though of making it a block function.

Now, my problem is that I’ve observed spurious highlighting of certain pages in my production site. I get the feeling that the output filter result has been saved in the smarty cache. This, afaik, shouldn’t happened (never used output filters before).

I need to dig out my problem. Will write back once I figure out what’s going on. Too late now in barcelona, almost midnight.
_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Fri Nov 14, 2003 11:30 am    Post subject: Reply with quote

I’m definitively seeing a problem with output filter and caching, the ouputfilter transformation ends up in the smarty cache. Perhaps this person also bumped into the same problem

http://www.phpinsider.com/smarty-forum/viewtopic.php?t=128

I'm going to apply the filter manually then.

I'm also using a few months old post 2.5 cvs version (needed a fix commited by messju). Don't know if it works in 2.6 rc (haven't tried it yet)
_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
messju
Administrator


Joined: 16 Apr 2003
Posts: 3336
Location: Oldenburg, Germany

PostPosted: Fri Nov 14, 2003 11:51 am    Post subject: Reply with quote

your observation is correct: output-filter's processing is cached.

AFAIR it has always been this way and i don't see a problem with that.
consider output-filters like trimwhitespace: it would be a waste of resources to do the trimming on each request instead of cache a write.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Fri Nov 14, 2003 12:05 pm    Post subject: Reply with quote

Well, the doc is not explicit about that. For some reason, this excerpt from the documentation lead me to believe that output filters are not cached.

Quote:
When the template is invoked via display() or fetch(), its output can be sent through one or more output filters. This differs from postfilters because postfilters operate on compiled templates before they are saved to the disk, and output filters operate on the template output when it is executed.


So, definitively a word highlighting functionality (for search results) can’t be implemented via outputfilter.

IMHO, some people might want to use output filters as a one off thing and not cache the results (a third param in load_filter function to control cacheability might not be a terrribe idea).
[/quote]
_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
messju
Administrator


Joined: 16 Apr 2003
Posts: 3336
Location: Oldenburg, Germany

PostPosted: Fri Nov 14, 2003 12:35 pm    Post subject: Reply with quote

@ubaldo: regarding your problem with accentuations:

the following code

[php:1:1577302c3b]$map_asc_iso = array();
$map_iso_asc = array();
foreach (get_html_translation_table(HTML_ENTITIES) as $iso_char=>$ent) {
if (preg_match('!^&(.).*(acute|cedil|circ|grave|lig|ring|slash|tilde|uml);$!i', $ent, $match)) {
$asc_char = $match[1];

if (!isset($map_asc_iso[$asc_char])) $map_asc_iso[$asc_char] = '';
$map_asc_iso[$asc_char] .= $iso_char;

$map_iso_asc[$iso_char] = $asc_char;
}
}

print_r($map_asc_iso);
print_r($map_iso_asc);
[/php:1:1577302c3b]

gives the following 2 nifty little tables:

Code:
Array
(
    [A] => ÀÁÂÃÄÅÆ
    [C] => Ç
    [E] => ÈÉÊË
    [I] => ÌÍÎÏ
    [N] => Ñ
    [O] => ÒÓÔÕÖØ
    [U] => ÙÚÛÜ
    [Y] => Ý
    [s] => ß
    [a] => àáâãäåæ
    [c] => ç
    [e] => èéêë
    [i] => ìíîï
    [n] => ñ
    [o] => òóôõöø
    [u] => ùúûü
    [y] => ýÿ
)
Array
(
    [À] => A
    [Á] => A
    [Â] => A
    [Ã] => A
    [Ä] => A
    [Å] => A
    [Æ] => A
    [Ç] => C
    [È] => E
    [É] => E
    [Ê] => E
    [Ë] => E
    [Ì] => I
    [Í] => I
    [Î] => I
    [Ï] => I
    [Ñ] => N
    [Ò] => O
    [Ó] => O
    [Ô] => O
    [Õ] => O
    [Ö] => O
    [Ø] => O
    [Ù] => U
    [Ú] => U
    [Û] => U
    [Ü] => U
    [Ý] => Y
    [ß] => s
    [à] => a
    [á] => a
    [â] => a
    [ã] => a
    [ä] => a
    [å] => a
    [æ] => a
    [ç] => c
    [è] => e
    [é] => e
    [ê] => e
    [ë] => e
    [ì] => i
    [í] => i
    [î] => i
    [ï] => i
    [ñ] => n
    [ò] => o
    [ó] => o
    [ô] => o
    [õ] => o
    [ö] => o
    [ø] => o
    [ù] => u
    [ú] => u
    [û] => u
    [ü] => u
    [ý] => y
    [ÿ] => y
)


the latter for example can be passed to strtr() to translate things like "mäßÿü" -> "masyu".

the former can be used to construct a regex automatically to match your hilighted words perhaps. for example translate "a" or "à" in your searchword to "[aàáâãäåæ]" (regexp-characterclass) to match the corresponding accentuated chars at this position.

i don't know if this helps with mysql's fulltext-search, i don't no that feature too much.

if you use it in a search- or hilight-engine i suggest you don't create the tables on each request, but once on installation and store them serialized or so somewhere.

feel free to use them (they came with your php anyway Smile ).

greetings
messju
Back to top
View user's profile Send private message Send e-mail Visit poster's website
messju
Administrator


Joined: 16 Apr 2003
Posts: 3336
Location: Oldenburg, Germany

PostPosted: Fri Nov 14, 2003 12:40 pm    Post subject: Reply with quote

ubaldo wrote:
Well, the doc is not explicit about that.


that's true

Quote:
For some reason, this excerpt from the documentation lead me to believe that output filters are not cached.

Quote:
When the template is invoked via display() or fetch(), its output can be sent through one or more output filters. This differs from postfilters because postfilters operate on compiled templates before they are saved to the disk, and output filters operate on the template output when it is executed.


So, definitively a word highlighting functionality (for search results) can't be implemented via outputfilter.

IMHO, some people might want to use output filters as a one off thing and not cache the results (a third param in load_filter function to control cacheability might not be a terrribe idea).


as wombat said: call the filter on your own instead of registering it. it's just one function-call "echo filter($source, $smarty);" . bloating smarty's api for such a little niche of application that can be replaced by one function-call doesn't sound terribly useful to me.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Fri Nov 14, 2003 2:11 pm    Post subject: Reply with quote

Naturally I agree with the principle of making api as simple as possible It just simply occur to me that there might be a situation whether you want to cache (or not) the results of an output filter. Say, for example, the only output filter that comes with the standard smarty distribution (trimwhitespace).

The filter is great to save bandwith, speed, etc but it’s hell if you want to figure out what’s wrong with your generated html code. So, you might want to have an extra parameter e.g ?with_space=1 to generate the page in question with spaces for readability. But wait, if you use the outputfilter, next time around the page will be shipped to normal users with spaces.

Clearly, the with_space=1 functionality could be easily implemented by calling the outputfilter “by hand” but it’s “prettier” to call it with an extra param.

Don’t know why I end up writing all this for such menial detail. Again, I just out of the blue came up with my own idea that outputfilters wouldn’t affect the cache, but I started using them yesterday anyway.

Thank you for the accent conversion thingy, I will definitively look into it.
_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Fri Nov 14, 2003 2:36 pm    Post subject: Reply with quote

Ubaldo,

I see what you are saying but I think maybe you are forgeting that Smarty is a general lib and not a framework meant to do everything for you. Not because it can't, but because it can not be all things to all people--we have too many varying needs Smile So I think it (rightly) tries to err on the side of speed and flexibility.

In particular, I prefer and very much expect a cache image to be the end story, final result of a production sequence. If I later want to modify that output, I maintain that ability--but for the vast majority of time (nearly 100%) I want to minimize the amount of work to display a cache file, particularly processing. If you don't believe me, look at the great lengths I go to here to do just that.

Besides, like me, you can choose to modify. In your case, subclass Smarty and overload display or fetch to automatically apply your filter, or better, create a new method to do that. I've used that technique myself with great success.

Cheers!!


Last edited by boots on Fri Nov 14, 2003 3:42 pm; edited 1 time in total
Back to top
View user's profile Send private message
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Fri Nov 14, 2003 3:05 pm    Post subject: Reply with quote

Hi boots

I agree with you on smarty position in the food chain. Sorry if I was starting to beat a dead horse.

It was an unlucky chain of events that trigger my confusion.

1-found a “cool” outputfilter that seemed to do what I wanted. I got the feeling that it was in production in the tikiwiki thing.

2-haven’t used (knowingly) outputfilter before so I went to read the doc and “inferred” out of nowhere that output filters didn’t persist in the smarty cache. My sentiment was reinforced by the existence of the pluggin 1 (thought that nobody could afford the disk space to cache search results for keyword ‘x’and ‘y’)

That’s it. End of my story. Point taken.

Moving on to figure out my real problem with accents, etc.

I’m very grateful to be relying on smarty. Many thanks
_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Sun Nov 16, 2003 12:10 pm    Post subject: Reply with quote

I completely forgot to check back in a slight modification to the outputfilter. The filter I found (see initial post), was matching within the entire page (that includes the head, which includes the title). Naturally, nobody would want to match within the title, or the head for that matter, hence my slight modification. Perhaps someone would want to take it further and do matching within a pair or specific tags <begin_highlight_block> </end_highlight_block>. The other problem is the “natural language” matching I was referring to (I’ll tackle that problem soon, need to gather some extra courage first and figure out if the mysql fultext matching rules are published somewhere, don't want to read the code Embarassed ) Ah, I forgot, I also do the query parsing outside the plugin, just a mater of preference, unimportant. BTW, if you have problems calling
Code:
echo smarty_outputfilter_highlight($page_output, $smarty)
you know who to talk to Laughing

Here it is:

Code:

function smarty_outputfilter_highlight($source, &$smarty)
{
   $tpl_vars = $smarty->get_template_vars();

   $wordArr = $tpl_vars['word_highlight'];

   if (!isset($wordArr))
   {
      return $source;
   }

   // Pull out the script blocks
   preg_match_all("!<script[^>]+>.*?</script>!is", $source, $match);
   $_script_blocks = $match[0];

   $source = preg_replace("!<script[^>]+>.*?</script>!is", '@@@SMARTY:TRIM:SCRIPT@@@', $source);

   // Pull out the head block
    preg_match_all("!<head>.*?</head>!is", $source, $match);
   $_head_blocks = $match[0];

   $source = preg_replace("!<head>.*?</head>!is", '@@@SMARTY:TRIM:HEAD@@@', $source);

   // pull out all html tags
   preg_match_all("'<[\/\!]*?[^<>]*?>'si", $source, $match);
   $_tag_blocks = $match[0];

   $source = preg_replace("'<[\/\!]*?[^<>]*?>'si", '@@@SMARTY:TRIM:TAG@@@', $source);

   // This array is used to choose colors for supplied highlight terms
   $colorArr = array('#FFFF9F','#C2FFFF','#BFFFBF','#FFC1C1','#FFBBFF');

   // Wrap all the highlight words with tags bolding them and changing their background colors
   $i = 0;

   foreach($wordArr as $word)
   {
      $source = preg_replace("/($word)/i", '<B style="color:black;background-color:'.$colorArr[$i].';">$1</B>', $source);
      $i++;
   }

   // replace head blocks
   foreach($_head_blocks as $curr_block)
   {
      $source = preg_replace("!@@@SMARTY:TRIM:HEAD@@@!",$curr_block,$source,1);
   }

   // replace script blocks
   foreach($_script_blocks as $curr_block)
   {
      $source = preg_replace("!@@@SMARTY:TRIM:SCRIPT@@@!",$curr_block,$source,1);
   }

   // replace tag blocks
   foreach($_tag_blocks as $curr_block)
   {
      $source = preg_replace("!@@@SMARTY:TRIM:TAG@@@!",$curr_block,$source,1);
   }

   return $source;
}

_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Smarty Forum Index -> Plugins All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Protected by Anti-Spam ACP