|
Smarty
WARNING: All discussion is moving to https://reddit.com/r/smarty, please go there! This forum will be closing soon. |
|
View previous topic :: View next topic |
Author |
Message |
Spuerhund Smarty Rookie
Joined: 20 Jan 2005 Posts: 16
|
Posted: Mon May 29, 2006 9:37 pm Post subject: a few bugs in the "truncate" plugin |
|
|
This is the complete code of the modifier.truncate.php function (copied from the latest Smarty version 2.6.14):
[php:1:6218fff286]function smarty_modifier_truncate($string, $length = 80, $etc = '...',
$break_words = false, $middle = false)
{
if ($length == 0)
return '';
if (strlen($string) > $length) {
$length -= strlen($etc);
if (!$break_words && !$middle) {
$string = preg_replace('/\s+?(\S+)?$/', '', substr($string, 0, $length+1));
}
if(!$middle) {
return substr($string, 0, $length).$etc;
} else {
return substr($string, 0, $length/2) . $etc . substr($string, -$length/2);
}
} else {
return $string;
}
}[/php:1:6218fff286]
There are several things to improve, some people (including me) would rather call them "bugs":
1) It shold be mentioned somewhere that this plugin does NOT work with unicode text! In the manual http://smarty.php.net/manual/en/language.modifier.truncate.php there is no such hint. If you want this code to work correctly with unicode (UTF-8 f. e.) you have to change "strlen" to "mb_strlen" and "substr" to "mb_substr". Finally you should add the "u" modifier to the preg_replace regexp.
mb_* will of course only work if the php mb_string extension is available, that can easily be checked by
Code: | if (extension_loaded('mbstring')) { ... } |
2) The regular expression /\s+?(\S+)?$/ is "suboptimal". First "\s+?" can be written as "\s", it will not change anything. Second "(\S+)" gets extracted without ever using it, thus it should be changed to "(?:\S+)" to save memory.
3) The value of the $etc variable gets appended to the string whenever it is longer than $length characters. But this is not always correct, as there should be a single whitespace between the last word and the value of $etc whenever words are not cut (break_words = false). There is a difference in the meaning of "foo bar..." and "foo bar ...", therefore depending on the value of $break_words there should be a whitespace.
4) The default value of $etc is '...' -- bad too. I refer to http://de.wikipedia.org/wiki/Auslassungspunkte which gives a good explanation about this issue. I just recommend to change the default value to "…" (unicode: & # 8 2 3 0 ; ).
Thats all for the moment, i keep on searching.
Spuerhund |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Mon May 29, 2006 11:43 pm Post subject: Re: a few bugs in the "truncate" plugin |
|
|
Hi and thanks for taking the time to write.
Spuerhund wrote: | 1) It shold be mentioned somewhere that this plugin does NOT work with unicode text! | You do know that this is the truth for all of PHP, yes? If you want mb type functions, you have to implement them youself.
Spuerhund wrote: | 2) The regular expression /\s+?(\S+)?$/ is "suboptimal". First "\s+?" can be written as "\s", it will not change anything. Second "(\S+)" gets extracted without ever using it, thus it should be changed to "(?:\S+)" to save memory. | "\s+?" != "\s" -- at best, it is the same as \s* (however, it is treated differently by the regex matching)
Spuerhund wrote: | 3) The value of the $etc variable gets appended to the string whenever it is longer than $length characters. But this is not always correct, as there should be a single whitespace between the last word and the value of $etc whenever words are not cut (break_words = false). There is a difference in the meaning of "foo bar..." and "foo bar ...", therefore depending on the value of $break_words there should be a whitespace.
| I don't really care one way or the other but I disagree with your assessment. They are both continuations and that is all.
Spuerhund wrote: | 4) The default value of $etc is '...' -- bad too. I refer to http://de.wikipedia.org/wiki/Auslassungspunkte which gives a good explanation about this issue. I just recommend to change the default value to "…" (unicode: & # 8 2 3 0 ; ). | Sorry, I don't speak german but we don't put unicode text into the default plugins.
I'm open to being shown the error-of-my-ways but my position is that if you want unicode behaviour, you have to build it yourself. When PHP officially supports unicode out-of-the-box, so will we.
Best Regards. |
|
Back to top |
|
bugmenot Smarty n00b
Joined: 31 May 2006 Posts: 1
|
Posted: Wed May 31, 2006 10:17 am Post subject: |
|
|
"\s+?" does the same as "\s" (at least in this case, because it's not limited at the left).
? does not only mean {0,1}, in cases where a quantifier like + or * is followed by ? it switches the greediness.
And why do you need Unicode support in PHP if you can just write the numeric entity? That could be used even in ASCII. I think it actually should be written as an entity to keep it (almost) independent of the used charset. |
|
Back to top |
|
boots Administrator
Joined: 16 Apr 2003 Posts: 5611 Location: Toronto, Canada
|
Posted: Wed May 31, 2006 3:46 pm Post subject: |
|
|
Okay, I'll concede the \s+? issue. It really doesn't seem pertinant enough to make a change, though.
As for using an entitiy, I think that is not the way to go. Now I'll refer to the wikipedia:
from the Chicago Manual of Style wrote: | Q. How do I insert an ellipsis in my manuscript? My computer keyboard can do that with a couple of keystrokes. Is this acceptable? Or should I type period + space for all three dots? Should these spaces be nonbreaking spaces?
A. For manuscripts, inserting an ellipsis character is a workable method, but it is not the preferred method. It is easy enough for a publisher to search for this unique character and replace it with the recommended three periods plus two nonbreaking spaces (. . .). But in addition to this extra step, there is also the potential for character-mapping problems (the ellipsis could appear as some other character) across software platforms—an added inconvenience. Moreover, the numeric entity for an ellipsis is not formally defined for standard HTML (and may not work with older browsers). So type three spaced dots, like this . . . or, at the end of a grammatical sentence, like this. . . . If you can, add two nonbreaking spaces to keep the three dots—or the last three of four—from breaking across a line. |
http://en.wikipedia.org/wiki/Ellipsis
Now there are obviously a lot of rules that can (or may) be applied here. Truncate seems to be working for me as it is. From my POV, I'd rather see a patch that offered a much fuller implementation of the style rules rather than just a few nitpicks that don't really change anything (eg: the regex's) or that prefer one non-canonical style guideline over another without any particular reason. And yes, entities are problematic. Sometimes people prefer plain-text output rather than HTML.
Finally, please register a real account rather than using a bugmenot account when posting here. Thanks. |
|
Back to top |
|
White Tiger Smarty n00b
Joined: 11 Sep 2008 Posts: 2
|
Posted: Thu Sep 11, 2008 7:31 am Post subject: |
|
|
I ran into the same truncate with UNICODE problem. I'm aware that this topic is 2 years old and I've found at least 3 others to discuss this question. I would like to reflect on your 'when PHP supports UNICODE out-of-the-box' remark.
- I write my PHP in UTF-8 (the editor codes all .php files in UNICODE). works fine.
- I use MySQL for data storage completely in UTF-8. works perfect.
- I display my HTML in UTF-8 (using <head><meta http-equiv="content-type" content="text/html; charset=utf-8"> </head>). works superb.
Now I do not have to take care about string format anymore: read sg. from MySql into a PHP variable and send it to the HTML. I think this is quite a frequent configuration nowadays.
You are right, that for UTF-8 string manipulation in PHP I have to use mb_ functions but then what truncate is for? The very base of Smarty's philosophy to take ALL of the formatting problems. If I have to do it in PHP then the whole idea is screwed up: I am not able to get rid of formatting problems in PHP.
I do not request or blame anything and I am very thankful to Smarty with much help in my development. But this problem is just disastrous to my multilingual project in the long run. I would very much appreciate the official mb_ versions of string manipulation plugins. Until then I use one of the patches offered here in the forum.
And you really _have to_ make a remark in the documentation that your string manipulation is not UNICODE-ready. Not a shame but a useful information for all the developers. |
|
Back to top |
|
Notromda Smarty Rookie
Joined: 30 Aug 2004 Posts: 13
|
Posted: Wed Nov 26, 2008 9:09 pm Post subject: Be consistant |
|
|
If the smarty plugins don't support UTF8 because php doesn't, then let's be consistent and offer official _mb_ functions that do the equivalent as PHP does.
I found this to be extremely useful, a mb_truncate filter:
http://www.guyrutenberg.com/2007/12/04/multibyte-string-truncate-modifier-for-smarty-mb_truncate/
While I'd rather have smarty know what encoding is in use and automatically do the right thing, I can live with alternate function names. But the option at least needs to be there for everyone to use. |
|
Back to top |
|
nothinghood Smarty Rookie
Joined: 20 Aug 2009 Posts: 5
|
Posted: Fri Aug 21, 2009 7:08 am Post subject: |
|
|
The
Code: | return mb_substr($string, 0, $length/2, $charset) . $etc . mb_substr($string, -$length/2, $charset); |
part does not work as you expect. In particular in second call to mb_substr $charset is passed as length.
As stated here
http://us3.php.net/manual/en/function.mb-substr.php#77515
this way works
Code: | return mb_substr($string, 0, $length/2, $charset) . $etc . mb_substr($string, -$length/2, mb_strlen($string), $charset); |
bye
nh |
|
Back to top |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|